Skip to content

Commit b87f753

Browse files
feat: remove download_and_parse_resource tool (#66)
The tool caused RAM and disk storage issues on production MCP server instances due to in-memory file parsing and potential large downloads. The Tabular API (query_resource_data) covers the main use cases, and LLMs can fall back to fetching raw file URLs directly via get_resource_info for unsupported formats or files exceeding Tabular API limits. Made-with: Cursor
1 parent 99836c4 commit b87f753

7 files changed

Lines changed: 10 additions & 352 deletions

README.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -344,13 +344,7 @@ The MCP server provides tools to interact with data.gouv.fr datasets and dataser
344344

345345
Parameters: `question` (required), `resource_id` (required), `page` (optional, default: 1), `page_size` (optional, default: 20, max: 200)
346346

347-
Note: Recommended workflow: 1) Use `search_datasets` to find the dataset, 2) Use `list_dataset_resources` to see available resources, 3) Use `query_resource_data` with default `page_size` (20) to preview data structure. For small datasets (<500 rows), increase `page_size` or paginate. For large datasets (>1000 rows), use `download_and_parse_resource` instead. Works for CSV/XLS resources within Tabular API size limits (CSV ≤ 100 MB, XLSX ≤ 12.5 MB).
348-
349-
- **`download_and_parse_resource`** - Download and parse a resource that is not accessible via Tabular API (files too large, formats not supported, external URLs).
350-
351-
Parameters: `resource_id` (required), `max_rows` (optional, default: 20), `max_size_mb` (optional, default: 500)
352-
353-
Supported formats: CSV, CSV.GZ, JSON, JSONL. Useful for files exceeding Tabular API limits or formats not supported by Tabular API. Start with default max_rows (20) to preview, then call again with higher max_rows if you need all data.
347+
Note: Recommended workflow: 1) Use `search_datasets` to find the dataset, 2) Use `list_dataset_resources` to see available resources, 3) Use `query_resource_data` with default `page_size` (20) to preview data structure. For small datasets (<500 rows), increase `page_size` or paginate. For large datasets (>1000 rows), continue paginating or use `get_resource_info` to retrieve the raw file URL and fetch it directly. Works for CSV/XLS resources within Tabular API size limits (CSV ≤ 100 MB, XLSX ≤ 12.5 MB).
354348

355349
### Dataservices (external APIs)
356350

tools/__init__.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,5 @@
11
from mcp.server.fastmcp import FastMCP
22

3-
from tools.download_and_parse_resource import (
4-
register_download_and_parse_resource_tool,
5-
)
63
from tools.get_dataservice_info import register_get_dataservice_info_tool
74
from tools.get_dataservice_openapi_spec import (
85
register_get_dataservice_openapi_spec_tool,
@@ -26,5 +23,4 @@ def register_tools(mcp: FastMCP) -> None:
2623
register_get_dataset_info_tool(mcp)
2724
register_list_dataset_resources_tool(mcp)
2825
register_get_resource_info_tool(mcp)
29-
register_download_and_parse_resource_tool(mcp)
3026
register_get_metrics_tool(mcp)

tools/download_and_parse_resource.py

Lines changed: 0 additions & 328 deletions
This file was deleted.

tools/get_resource_info.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ async def get_resource_info(resource_id: str) -> str:
1111
Get detailed information about a specific resource (file).
1212
1313
Returns format, size, MIME type, URL, and checks Tabular API availability.
14-
Helps decide which tool to use: query_resource_data (if Tabular API available)
15-
or download_and_parse_resource (for large files or unsupported formats).
14+
Helps decide whether to use query_resource_data (if Tabular API is available)
15+
or fetch the raw file URL directly for unsupported formats or large files.
1616
"""
1717
try:
1818
# Get full resource data from API v2

tools/list_dataset_resources.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ async def list_dataset_resources(dataset_id: str) -> str:
1010
List all resources (files) in a dataset with their metadata.
1111
1212
Returns resource ID, title, format, size, and URL for each file.
13-
Next step: use query_resource_data for CSV/XLSX files,
14-
or download_and_parse_resource for other formats (JSON, JSONL) or large datasets.
13+
Next step: use query_resource_data for CSV/XLSX files via the Tabular API,
14+
or fetch the resource URL directly for other formats (JSON, JSONL) or large datasets.
1515
"""
1616
try:
1717
dataset = await datagouv_api_client.get_dataset_details(dataset_id)

0 commit comments

Comments
 (0)