Skip to content

Commit dff3dc8

Browse files
committed
Add output parameter to BCB modules for raw data retrieval
1 parent 739445e commit dff3dc8

7 files changed

Lines changed: 443 additions & 13 deletions

File tree

.claude/plans/text-output.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Plan: Add `output="text"` to All BCB Modules
2+
3+
## Context
4+
5+
The package's primary purpose is fetching BCB API data into pandas DataFrames. Users building SOR/SOT/SPEC data pipelines need to persist raw downloaded data before transformation — it's bad practice to serialize a DataFrame back to text (lossy, format-dependent). Each BCB module already holds the raw text internally but doesn't expose it. This plan adds `output="text"` to all public `get()` / `collect()` functions so pipelines can save the exact bytes returned by BCB before any pandas processing.
6+
7+
---
8+
9+
## Scope
10+
11+
All three modules: **OData**, **SGS**, **Currency**
12+
13+
---
14+
15+
## Interface
16+
17+
Add `output: str = "dataframe"` parameter to:
18+
- `EndpointQuery.collect(output=...)``"dataframe"` returns `pd.DataFrame`, `"text"` returns `str`
19+
- `Endpoint.get(*args, output=..., **kwargs)` → same
20+
- `sgs.get(codes, ..., output=...)``"dataframe"` unchanged, `"text"` returns `str` (single code) or `dict[int, str]` (multiple codes, keyed by code value)
21+
- `currency.get(symbols, ..., output=...)``"dataframe"` unchanged, `"text"` returns `str` (single symbol) or `dict[str, str]` (multiple symbols, keyed by ISO symbol)
22+
23+
Use `@overload` + `Literal["text", "dataframe"]` for mypy --strict compliance on each function.
24+
25+
---
26+
27+
## What "text" contains per module
28+
29+
| Module | Raw text format | Source |
30+
|--------|----------------|--------|
31+
| OData | OData JSON response: `{"@odata.context": "...", "value": [...]}` | `ODataQuery.text()` (already exists in `framework.py:505`) |
32+
| SGS | BCB SGS JSON array: `[{"data": "01/01/2024", "valor": "100.5"}, ...]` | `sgs.get_json()` (already exists in `sgs/__init__.py:160`) |
33+
| Currency | BCB PTAX semicolon-delimited CSV | `res.text` inside `_get_symbol()` |
34+
35+
---
36+
37+
## File Changes
38+
39+
### 1. `bcb/odata/api.py`
40+
41+
**`EndpointQuery.collect()`** — add `output` param:
42+
```python
43+
def collect(self, output: str = "dataframe") -> Union[pd.DataFrame, str]:
44+
if output == "text":
45+
return self.text() # inherited from ODataQuery in framework.py:505
46+
# ... existing DataFrame logic unchanged
47+
```
48+
49+
**`Endpoint.get()`** — intercept `output` kwarg before it reaches `_query.parameters()`:
50+
```python
51+
output_format = "dataframe"
52+
for k, val in kwargs.items():
53+
if k == "limit": ...
54+
elif k == "output":
55+
output_format = val
56+
else:
57+
_query.parameters(**{k: val})
58+
...
59+
data = _query.collect(output=output_format)
60+
```
61+
62+
Add `@overload` stubs and update return type to `Union[pd.DataFrame, str]`.
63+
64+
### 2. `bcb/sgs/__init__.py`
65+
66+
**`sgs.get()`** — add `output` param with early-return branch:
67+
```python
68+
def get(codes, start, end, last, multi, freq, output="dataframe"):
69+
if output == "text":
70+
results = {c.value: get_json(c.value, start, end, last) for c in _codes(codes)}
71+
# single code → str, multiple codes → dict[int, str]
72+
values = list(results.values())
73+
return values[0] if len(values) == 1 else results
74+
# ... existing DataFrame logic unchanged
75+
```
76+
77+
Add `@overload` stubs:
78+
- `output: Literal["dataframe"]``Union[pd.DataFrame, List[pd.DataFrame]]`
79+
- `output: Literal["text"]``Union[str, dict[int, str]]`
80+
81+
### 3. `bcb/currency.py`
82+
83+
**New helper `_fetch_symbol_response()`**: extracts shared HTTP logic from `_get_symbol()` to avoid duplication.
84+
85+
```python
86+
def _fetch_symbol_response(symbol, start_date, end_date) -> Optional[httpx.Response]:
87+
try:
88+
cid = _get_currency_id(symbol)
89+
except CurrencyNotFoundError:
90+
return None
91+
res = httpx.get(_currency_url(cid, start_date, end_date), follow_redirects=True)
92+
if res.headers["Content-Type"].startswith("text/html"):
93+
# existing HTML error warn logic (moved from _get_symbol)
94+
return None
95+
return res
96+
97+
def _get_symbol(symbol, start_date, end_date) -> Optional[pd.DataFrame]:
98+
res = _fetch_symbol_response(symbol, start_date, end_date)
99+
if res is None:
100+
return None
101+
# ... existing CSV parse logic (unchanged)
102+
103+
def _get_symbol_text(symbol, start_date, end_date) -> Optional[str]:
104+
res = _fetch_symbol_response(symbol, start_date, end_date)
105+
return res.text if res is not None else None
106+
```
107+
108+
**`currency.get()`** — add `output` param:
109+
```python
110+
if output == "text":
111+
results = {s: _get_symbol_text(s, start, end) for s in symbols}
112+
results = {k: v for k, v in results.items() if v is not None}
113+
if not results:
114+
raise CurrencyNotFoundError(...)
115+
return results[symbols[0]] if len(symbols) == 1 else results
116+
```
117+
118+
Add `@overload` stubs:
119+
- `output: Literal["dataframe"]``pd.DataFrame`
120+
- `output: Literal["text"]``Union[str, dict[str, str]]`
121+
122+
---
123+
124+
## Tests
125+
126+
Add to existing test files (using mocked HTTP via `pytest-httpx`):
127+
128+
- **`tests/test_odata.py`**: `EndpointQuery.collect(output="text")` returns str; `Endpoint.get(output="text")` returns str.
129+
- **`tests/test_sgs.py`**: `sgs.get(1, ..., output="text")` returns `str`; `sgs.get([1, 2], ..., output="text")` returns `dict[int, str]`.
130+
- **`tests/test_currency.py`**: `currency.get("USD", ..., output="text")` returns `str`; `currency.get(["USD", "EUR"], ..., output="text")` returns `dict[str, str]`.
131+
132+
---
133+
134+
## Verification
135+
136+
```bash
137+
# Unit tests (mocked)
138+
poetry run pytest -m "not integration" tests/test_odata.py tests/test_sgs.py tests/test_currency.py
139+
140+
# Type check
141+
poetry run mypy bcb/
142+
143+
# Quick smoke test (live)
144+
poetry run python -c "
145+
from bcb import sgs
146+
text = sgs.get(1, last=3, output='text')
147+
print(type(text), text[:80])
148+
"
149+
```

bcb/currency.py

Lines changed: 57 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import warnings
33
from datetime import date, timedelta
44
from io import BytesIO, StringIO
5-
from typing import List, Optional, Union
5+
from typing import Dict, List, Literal, Optional, Union, overload
66

77
import httpx
88
import numpy as np
@@ -121,16 +121,15 @@ def _get_currency_id(symbol: str) -> int:
121121
return int(matches.max())
122122

123123

124-
def _get_symbol(
124+
def _fetch_symbol_response(
125125
symbol: str, start_date: DateInput, end_date: DateInput
126-
) -> Optional[pd.DataFrame]:
126+
) -> Optional[httpx.Response]:
127127
try:
128128
cid = _get_currency_id(symbol)
129129
except CurrencyNotFoundError:
130130
return None
131131
url = _currency_url(cid, start_date, end_date)
132132
res = httpx.get(url, follow_redirects=True)
133-
134133
if res.headers["Content-Type"].startswith("text/html"):
135134
doc = html.parse(BytesIO(res.content)).getroot()
136135
xpath = "//div[@class='msgErro']"
@@ -141,7 +140,15 @@ def _get_symbol(
141140
msg = f"BCB API returned error: {x} - {symbol}"
142141
warnings.warn(msg)
143142
return None
143+
return res
144+
144145

146+
def _get_symbol(
147+
symbol: str, start_date: DateInput, end_date: DateInput
148+
) -> Optional[pd.DataFrame]:
149+
res = _fetch_symbol_response(symbol, start_date, end_date)
150+
if res is None:
151+
return None
145152
columns = ["Date", "aa", "bb", "cc", "bid", "ask", "dd", "ee"]
146153
df = pd.read_csv(
147154
StringIO(res.text), delimiter=";", header=None, names=columns, dtype=str
@@ -159,13 +166,43 @@ def _get_symbol(
159166
return df1
160167

161168

169+
def _get_symbol_text(
170+
symbol: str, start_date: DateInput, end_date: DateInput
171+
) -> Optional[str]:
172+
res = _fetch_symbol_response(symbol, start_date, end_date)
173+
return res.text if res is not None else None
174+
175+
176+
@overload
177+
def get(
178+
symbols: Union[str, List[str]],
179+
start: DateInput,
180+
end: DateInput,
181+
side: str = ...,
182+
groupby: str = ...,
183+
output: Literal["dataframe"] = ...,
184+
) -> pd.DataFrame: ...
185+
186+
187+
@overload
188+
def get(
189+
symbols: Union[str, List[str]],
190+
start: DateInput,
191+
end: DateInput,
192+
side: str = ...,
193+
groupby: str = ...,
194+
output: Literal["text"] = ...,
195+
) -> Union[str, Dict[str, str]]: ...
196+
197+
162198
def get(
163199
symbols: Union[str, List[str]],
164200
start: DateInput,
165201
end: DateInput,
166202
side: str = "ask",
167203
groupby: str = "symbol",
168-
) -> pd.DataFrame:
204+
output: str = "dataframe",
205+
) -> Union[pd.DataFrame, str, Dict[str, str]]:
169206
"""
170207
Retorna um DataFrame pandas com séries temporais com taxas de câmbio.
171208
@@ -204,6 +241,19 @@ def get(
204241
"""
205242
if isinstance(symbols, str):
206243
symbols = [symbols]
244+
245+
if output == "text":
246+
results: Dict[str, str] = {}
247+
for symbol in symbols:
248+
raw = _get_symbol_text(symbol, start, end)
249+
if raw is not None:
250+
results[symbol] = raw
251+
if not results:
252+
raise CurrencyNotFoundError(f"Currency not found: {symbols}")
253+
if len(symbols) == 1:
254+
return results[symbols[0]]
255+
return results
256+
207257
dss = []
208258
for symbol in symbols:
209259
df1 = _get_symbol(symbol, start, end)
@@ -219,6 +269,8 @@ def get(
219269
return df
220270
elif groupby == "side":
221271
return df.reorder_levels([1, 0], axis=1).sort_index(axis=1)
272+
else:
273+
raise ValueError("Unknown groupby value, use: symbol, side")
222274
else:
223275
raise ValueError("Unknown side value, use: bid, ask, both")
224276
else:

bcb/odata/api.py

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from typing import Any, Optional
1+
from typing import Any, Literal, Optional, Union, overload
22
from .framework import (
33
ODataEntitySet,
44
ODataFunctionImport,
@@ -50,7 +50,15 @@ def __init__(
5050
super().__init__(entity, url)
5151
self._date_columns: list[str] = date_columns or []
5252

53-
def collect(self) -> pd.DataFrame:
53+
@overload
54+
def collect(self, output: Literal["dataframe"] = ...) -> pd.DataFrame: ...
55+
56+
@overload
57+
def collect(self, output: Literal["text"]) -> str: ...
58+
59+
def collect(self, output: str = "dataframe") -> Union[pd.DataFrame, str]:
60+
if output == "text":
61+
return self.text()
5462
raw_data = super().collect()
5563
data = pd.DataFrame(raw_data["value"])
5664
if not self._raw:
@@ -109,19 +117,21 @@ def __init__(
109117
self._url = url
110118
self._date_columns: list[str] = date_columns or []
111119

112-
def get(self, *args: Any, **kwargs: Any) -> pd.DataFrame:
120+
def get(self, *args: Any, **kwargs: Any) -> Union[pd.DataFrame, str]:
113121
"""
114122
Executa a consulta na API OData e retorna o resultado.
115123
116124
Parameters
117125
----------
118126
*args : argumentos para a consulta
119127
120-
**kwargs : argumentos para a consulta
128+
**kwargs : argumentos para a consulta. Use ``output='text'`` to get
129+
the raw OData JSON response string instead of a DataFrame.
121130
122131
Returns
123132
-------
124-
pd.DataFrame: resultado da consulta
133+
pd.DataFrame or str: resultado da consulta. Returns a DataFrame by
134+
default; returns a raw JSON string when ``output='text'``.
125135
"""
126136
_query = EndpointQuery(self._entity, self._url, self._date_columns)
127137
for arg in args:
@@ -132,20 +142,26 @@ def get(self, *args: Any, **kwargs: Any) -> pd.DataFrame:
132142
elif isinstance(arg, ODataProperty):
133143
_query.select(arg)
134144
verbose = False
145+
output_format = "dataframe"
135146
for k, val in kwargs.items():
136147
if k == "limit":
137148
_query.limit(val)
138149
elif k == "skip":
139150
_query.skip(val)
140151
elif k == "verbose":
141152
verbose = val
153+
elif k == "output":
154+
output_format = val
142155
else:
143156
_query.parameters(**{k: val})
144157
_query.format("application/json")
145158

146159
if verbose:
147160
_query.show()
148-
data = _query.collect()
161+
if output_format == "text":
162+
data = _query.collect(output="text")
163+
else:
164+
data = _query.collect()
149165
_query.reset()
150166
return data
151167

0 commit comments

Comments
 (0)