|
| 1 | +# Plan: Add `output="text"` to All BCB Modules |
| 2 | + |
| 3 | +## Context |
| 4 | + |
| 5 | +The package's primary purpose is fetching BCB API data into pandas DataFrames. Users building SOR/SOT/SPEC data pipelines need to persist raw downloaded data before transformation — it's bad practice to serialize a DataFrame back to text (lossy, format-dependent). Each BCB module already holds the raw text internally but doesn't expose it. This plan adds `output="text"` to all public `get()` / `collect()` functions so pipelines can save the exact bytes returned by BCB before any pandas processing. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Scope |
| 10 | + |
| 11 | +All three modules: **OData**, **SGS**, **Currency** |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Interface |
| 16 | + |
| 17 | +Add `output: str = "dataframe"` parameter to: |
| 18 | +- `EndpointQuery.collect(output=...)` → `"dataframe"` returns `pd.DataFrame`, `"text"` returns `str` |
| 19 | +- `Endpoint.get(*args, output=..., **kwargs)` → same |
| 20 | +- `sgs.get(codes, ..., output=...)` → `"dataframe"` unchanged, `"text"` returns `str` (single code) or `dict[int, str]` (multiple codes, keyed by code value) |
| 21 | +- `currency.get(symbols, ..., output=...)` → `"dataframe"` unchanged, `"text"` returns `str` (single symbol) or `dict[str, str]` (multiple symbols, keyed by ISO symbol) |
| 22 | + |
| 23 | +Use `@overload` + `Literal["text", "dataframe"]` for mypy --strict compliance on each function. |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## What "text" contains per module |
| 28 | + |
| 29 | +| Module | Raw text format | Source | |
| 30 | +|--------|----------------|--------| |
| 31 | +| OData | OData JSON response: `{"@odata.context": "...", "value": [...]}` | `ODataQuery.text()` (already exists in `framework.py:505`) | |
| 32 | +| SGS | BCB SGS JSON array: `[{"data": "01/01/2024", "valor": "100.5"}, ...]` | `sgs.get_json()` (already exists in `sgs/__init__.py:160`) | |
| 33 | +| Currency | BCB PTAX semicolon-delimited CSV | `res.text` inside `_get_symbol()` | |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## File Changes |
| 38 | + |
| 39 | +### 1. `bcb/odata/api.py` |
| 40 | + |
| 41 | +**`EndpointQuery.collect()`** — add `output` param: |
| 42 | +```python |
| 43 | +def collect(self, output: str = "dataframe") -> Union[pd.DataFrame, str]: |
| 44 | + if output == "text": |
| 45 | + return self.text() # inherited from ODataQuery in framework.py:505 |
| 46 | + # ... existing DataFrame logic unchanged |
| 47 | +``` |
| 48 | + |
| 49 | +**`Endpoint.get()`** — intercept `output` kwarg before it reaches `_query.parameters()`: |
| 50 | +```python |
| 51 | +output_format = "dataframe" |
| 52 | +for k, val in kwargs.items(): |
| 53 | + if k == "limit": ... |
| 54 | + elif k == "output": |
| 55 | + output_format = val |
| 56 | + else: |
| 57 | + _query.parameters(**{k: val}) |
| 58 | +... |
| 59 | +data = _query.collect(output=output_format) |
| 60 | +``` |
| 61 | + |
| 62 | +Add `@overload` stubs and update return type to `Union[pd.DataFrame, str]`. |
| 63 | + |
| 64 | +### 2. `bcb/sgs/__init__.py` |
| 65 | + |
| 66 | +**`sgs.get()`** — add `output` param with early-return branch: |
| 67 | +```python |
| 68 | +def get(codes, start, end, last, multi, freq, output="dataframe"): |
| 69 | + if output == "text": |
| 70 | + results = {c.value: get_json(c.value, start, end, last) for c in _codes(codes)} |
| 71 | + # single code → str, multiple codes → dict[int, str] |
| 72 | + values = list(results.values()) |
| 73 | + return values[0] if len(values) == 1 else results |
| 74 | + # ... existing DataFrame logic unchanged |
| 75 | +``` |
| 76 | + |
| 77 | +Add `@overload` stubs: |
| 78 | +- `output: Literal["dataframe"]` → `Union[pd.DataFrame, List[pd.DataFrame]]` |
| 79 | +- `output: Literal["text"]` → `Union[str, dict[int, str]]` |
| 80 | + |
| 81 | +### 3. `bcb/currency.py` |
| 82 | + |
| 83 | +**New helper `_fetch_symbol_response()`**: extracts shared HTTP logic from `_get_symbol()` to avoid duplication. |
| 84 | + |
| 85 | +```python |
| 86 | +def _fetch_symbol_response(symbol, start_date, end_date) -> Optional[httpx.Response]: |
| 87 | + try: |
| 88 | + cid = _get_currency_id(symbol) |
| 89 | + except CurrencyNotFoundError: |
| 90 | + return None |
| 91 | + res = httpx.get(_currency_url(cid, start_date, end_date), follow_redirects=True) |
| 92 | + if res.headers["Content-Type"].startswith("text/html"): |
| 93 | + # existing HTML error warn logic (moved from _get_symbol) |
| 94 | + return None |
| 95 | + return res |
| 96 | + |
| 97 | +def _get_symbol(symbol, start_date, end_date) -> Optional[pd.DataFrame]: |
| 98 | + res = _fetch_symbol_response(symbol, start_date, end_date) |
| 99 | + if res is None: |
| 100 | + return None |
| 101 | + # ... existing CSV parse logic (unchanged) |
| 102 | + |
| 103 | +def _get_symbol_text(symbol, start_date, end_date) -> Optional[str]: |
| 104 | + res = _fetch_symbol_response(symbol, start_date, end_date) |
| 105 | + return res.text if res is not None else None |
| 106 | +``` |
| 107 | + |
| 108 | +**`currency.get()`** — add `output` param: |
| 109 | +```python |
| 110 | +if output == "text": |
| 111 | + results = {s: _get_symbol_text(s, start, end) for s in symbols} |
| 112 | + results = {k: v for k, v in results.items() if v is not None} |
| 113 | + if not results: |
| 114 | + raise CurrencyNotFoundError(...) |
| 115 | + return results[symbols[0]] if len(symbols) == 1 else results |
| 116 | +``` |
| 117 | + |
| 118 | +Add `@overload` stubs: |
| 119 | +- `output: Literal["dataframe"]` → `pd.DataFrame` |
| 120 | +- `output: Literal["text"]` → `Union[str, dict[str, str]]` |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## Tests |
| 125 | + |
| 126 | +Add to existing test files (using mocked HTTP via `pytest-httpx`): |
| 127 | + |
| 128 | +- **`tests/test_odata.py`**: `EndpointQuery.collect(output="text")` returns str; `Endpoint.get(output="text")` returns str. |
| 129 | +- **`tests/test_sgs.py`**: `sgs.get(1, ..., output="text")` returns `str`; `sgs.get([1, 2], ..., output="text")` returns `dict[int, str]`. |
| 130 | +- **`tests/test_currency.py`**: `currency.get("USD", ..., output="text")` returns `str`; `currency.get(["USD", "EUR"], ..., output="text")` returns `dict[str, str]`. |
| 131 | + |
| 132 | +--- |
| 133 | + |
| 134 | +## Verification |
| 135 | + |
| 136 | +```bash |
| 137 | +# Unit tests (mocked) |
| 138 | +poetry run pytest -m "not integration" tests/test_odata.py tests/test_sgs.py tests/test_currency.py |
| 139 | + |
| 140 | +# Type check |
| 141 | +poetry run mypy bcb/ |
| 142 | + |
| 143 | +# Quick smoke test (live) |
| 144 | +poetry run python -c " |
| 145 | +from bcb import sgs |
| 146 | +text = sgs.get(1, last=3, output='text') |
| 147 | +print(type(text), text[:80]) |
| 148 | +" |
| 149 | +``` |
0 commit comments