fix: add MCP resource describing catalog scope#71
fix: add MCP resource describing catalog scope#71AbdelStark wants to merge 1 commit intodatagouv:mainfrom
Conversation
|
hey, would appreciate a review when you have time @bolinocroustibat |
bolinocroustibat
left a comment
There was a problem hiding this comment.
Good idea, thanks for your contribution.
|
|
||
|
|
||
| def _build_catalog_overview() -> str: | ||
| site_url = env_config.get_base_url("site") |
There was a problem hiding this comment.
nitpicks:
- Why not naming them with the exact same nomenclature? i.e.:
site
datagouv_api - I would type hint them so that we can clearly see the type here without following the helpers:
site: str = ...
datagouv_api: str = ...
There was a problem hiding this comment.
could we suggest to the LLM to look up the catalog (updated daily) here: https://www.data.gouv.fr/datasets/catalogue-des-donnees-de-data-gouv-fr
? Remember that we can't ask it to load it (it's 70k+ datasets and 350k+ files)
...or we could redirect the LLM to the HVD catalog (see #92)
| "- get_dataservice_openapi_spec: fetch and summarize a dataservice OpenAPI spec.", | ||
| "- get_metrics: retrieve dataset or resource usage metrics.", | ||
| "", | ||
| "Example questions this server should answer by searching first:", |
There was a problem hiding this comment.
Example question will be mostly in French. Should we give French examples?
| "Example questions this server should answer by searching first:", | ||
| '- "Do you have datasets about housing prices in Paris?"', | ||
| '- "Are there resources about EV charging stations?"', | ||
| '- "What data is available for the Assemblee nationale?"', |
There was a problem hiding this comment.
typos: "Assemblee nationale" -> "Assemblée Nationale"
| [ | ||
| "# data.gouv.fr Catalog Overview", | ||
| "", | ||
| "This MCP server does not ship a fixed, preloaded list of datasets.", |
There was a problem hiding this comment.
data.gouv.fr also provide a list of dataservices (external APIs), this should be explained as well
| "It provides live read-only access to the current public catalog exposed by data.gouv.fr.", | ||
| "", | ||
| f"Current catalog site: {site_url}", | ||
| f"Current API base: {data_env}", |
There was a problem hiding this comment.
"API base" can be confused with external APIs / dataservices.
I suggest "data.gouv.fr main API base".
| "", | ||
| "When a user asks whether data is available for a topic, search first before concluding that the data is unavailable.", | ||
| "", | ||
| "Recommended discovery workflow:", |
There was a problem hiding this comment.
Should we update the README "Recommended workflow" to fit to this?
| ## 🛠️ Available Tools | ||
|
|
||
| The MCP server provides tools to interact with data.gouv.fr datasets and dataservices. | ||
| It also exposes a `Catalog Scope and Discovery Guide` MCP resource so clients can |
There was a problem hiding this comment.
style: The README is formatted with full lines
|
Related: #63 |
|
Another way to tackle that would be #92 |
Tell me what you prefer. If you want to tackle it that's fine i can let you handle it. |
| "5. Use query_resource_data to preview, filter, and sort tabular resources.", | ||
| "6. Use get_metrics for usage metrics when you need visits or downloads.", | ||
| "", | ||
| "Available tools:", |
There was a problem hiding this comment.
If we're correct, when a LLM connects to a MCP, one of the very first request/response if listing the available tool, so wouldn't listing them also in the list resources response redundant and a waste of tokens?
I think we can continue on this anyway, this doesn't contradict #92. But we sees this not as a fix for a specific bug or issue (at least not from what we collected on our side), but rather as an effort to take advantage of the fact that resource and tool discovery are among the first interactions between an LLM and an MCP server, by implementing a resource listing feature which we haven’t utilized until now, and better align with the Model Context Protocol (MCP) standards. This could serve as a proactive way to prime the LLM on the intended workflow right from the start. Are we aligned on this? If so, I would suggest to start simple in the list resources response, bu just explaining what is data.gouv.fr, suggesting a basic workflow, without overloading the context. This can be improved in later PRs (what are the best practices, where is the documentation, etc.). No need to list the tools as the request to list the tools already exists. We might have to iterate several times over the list resources response text blindly before this can be merged. |
Closes #63
Summary
Clients currently see an empty MCP resources list, which can make them conclude that the server has no accessible datasets.
This PR adds a single static MCP resource,
datagouv://catalog-overview, that explains:Changes
Catalog Scope and Discovery GuideMCP resourceTest plan
uv run pytestuv run ty checkuv run ruff check