Skip to content

fix: add MCP resource describing catalog scope#71

Open
AbdelStark wants to merge 1 commit intodatagouv:mainfrom
AbdelStark:feat/catalog-overview-resource
Open

fix: add MCP resource describing catalog scope#71
AbdelStark wants to merge 1 commit intodatagouv:mainfrom
AbdelStark:feat/catalog-overview-resource

Conversation

@AbdelStark
Copy link
Copy Markdown
Contributor

Closes #63

Summary

Clients currently see an empty MCP resources list, which can make them conclude that the server has no accessible datasets.

This PR adds a single static MCP resource, datagouv://catalog-overview, that explains:

  • the server has live read-only access to the current data.gouv.fr catalog
  • callers should search before saying data is unavailable
  • the recommended discovery workflow and available tools

Changes

  • register a Catalog Scope and Discovery Guide MCP resource
  • document the new resource in the README
  • add tests covering resource listing and content

Test plan

  • uv run pytest
  • uv run ty check
  • uv run ruff check

@AbdelStark
Copy link
Copy Markdown
Contributor Author

hey, would appreciate a review when you have time @bolinocroustibat

Copy link
Copy Markdown
Collaborator

@bolinocroustibat bolinocroustibat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks for your contribution.



def _build_catalog_overview() -> str:
site_url = env_config.get_base_url("site")
Copy link
Copy Markdown
Collaborator

@bolinocroustibat bolinocroustibat Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicks:

  1. Why not naming them with the exact same nomenclature? i.e.:
    site
    datagouv_api
  2. I would type hint them so that we can clearly see the type here without following the helpers:
    site: str = ...
    datagouv_api: str = ...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we suggest to the LLM to look up the catalog (updated daily) here: https://www.data.gouv.fr/datasets/catalogue-des-donnees-de-data-gouv-fr
? Remember that we can't ask it to load it (it's 70k+ datasets and 350k+ files)

...or we could redirect the LLM to the HVD catalog (see #92)

"- get_dataservice_openapi_spec: fetch and summarize a dataservice OpenAPI spec.",
"- get_metrics: retrieve dataset or resource usage metrics.",
"",
"Example questions this server should answer by searching first:",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example question will be mostly in French. Should we give French examples?

"Example questions this server should answer by searching first:",
'- "Do you have datasets about housing prices in Paris?"',
'- "Are there resources about EV charging stations?"',
'- "What data is available for the Assemblee nationale?"',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos: "Assemblee nationale" -> "Assemblée Nationale"

[
"# data.gouv.fr Catalog Overview",
"",
"This MCP server does not ship a fixed, preloaded list of datasets.",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data.gouv.fr also provide a list of dataservices (external APIs), this should be explained as well

"It provides live read-only access to the current public catalog exposed by data.gouv.fr.",
"",
f"Current catalog site: {site_url}",
f"Current API base: {data_env}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"API base" can be confused with external APIs / dataservices.
I suggest "data.gouv.fr main API base".

"",
"When a user asks whether data is available for a topic, search first before concluding that the data is unavailable.",
"",
"Recommended discovery workflow:",
Copy link
Copy Markdown
Collaborator

@bolinocroustibat bolinocroustibat Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the README "Recommended workflow" to fit to this?

Comment thread README.md
## 🛠️ Available Tools

The MCP server provides tools to interact with data.gouv.fr datasets and dataservices.
It also exposes a `Catalog Scope and Discovery Guide` MCP resource so clients can
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The README is formatted with full lines

@bolinocroustibat
Copy link
Copy Markdown
Collaborator

Related: #63

@bolinocroustibat
Copy link
Copy Markdown
Collaborator

Another way to tackle that would be #92

@bolinocroustibat bolinocroustibat added feature New MCP tools, prompts, resource types, or protocol capabilities 2 - Moderate priority Important but not urgent labels Apr 2, 2026
@AbdelStark
Copy link
Copy Markdown
Contributor Author

Another way to tackle that would be #92

Tell me what you prefer. If you want to tackle it that's fine i can let you handle it.
But otherwise I can also address your comments and finish the PR, whatever works best for you.

"5. Use query_resource_data to preview, filter, and sort tabular resources.",
"6. Use get_metrics for usage metrics when you need visits or downloads.",
"",
"Available tools:",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're correct, when a LLM connects to a MCP, one of the very first request/response if listing the available tool, so wouldn't listing them also in the list resources response redundant and a waste of tokens?

@bolinocroustibat
Copy link
Copy Markdown
Collaborator

bolinocroustibat commented Apr 13, 2026

Another way to tackle that would be #92

Tell me what you prefer. If you want to tackle it that's fine i can let you handle it. But otherwise I can also address your comments and finish the PR, whatever works best for you.

I think we can continue on this anyway, this doesn't contradict #92. But we sees this not as a fix for a specific bug or issue (at least not from what we collected on our side), but rather as an effort to take advantage of the fact that resource and tool discovery are among the first interactions between an LLM and an MCP server, by implementing a resource listing feature which we haven’t utilized until now, and better align with the Model Context Protocol (MCP) standards. This could serve as a proactive way to prime the LLM on the intended workflow right from the start.

Are we aligned on this?

If so, I would suggest to start simple in the list resources response, bu just explaining what is data.gouv.fr, suggesting a basic workflow, without overloading the context. This can be improved in later PRs (what are the best practices, where is the documentation, etc.). No need to list the tools as the request to list the tools already exists. We might have to iterate several times over the list resources response text blindly before this can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 - Moderate priority Important but not urgent feature New MCP tools, prompts, resource types, or protocol capabilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No reflexivity about it's datasets

2 participants