add content signals to the robots.txt file#2517
Conversation
There was a problem hiding this comment.
Pull request overview
Adds Content-Signal directives to robots.txt to explicitly communicate AI-related usage preferences for the Stellar developer docs site.
Changes:
- Introduces a
Content-Signaldirective forUser-agent: *(ai-train=no, search=yes, ai-input=yes). - Keeps existing crawl allowance and sitemap declaration intact.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Preview is available here: |
kaankacar
left a comment
There was a problem hiding this comment.
The one I'd push on is ai-train=no. I think we want yes here, and the reasoning is that ai-train=yes gets our documentation baked into the base AI models, so they have baseline Stellar fluency out of the box, before any retrieval. So I'm not sure if opting an open source documentation out of the training data of the models devs use every day works for us.
ai-train=no makes a lot of sense for, say, marketing copy or original editorial content we want to protect. For API/reference docs whose whole purpose is to teach people (and increasingly agents) how to use the platform, I'd lean toward ai-train=yes . Curious if anyone sees a downside I'm missing.
…ts-content-signals
|
Preview is available here: |
|
Preview is available here: |
I read through the information on https://contentsignals.org/ to get a sense of how we should set these content signals. What i came up with was:
ai-train=no: i don't really have a reason besides that it was the example given by isitagentready.com. in its current state, these rules apply a "blanket" set of policies for all agents. down the road, if we're actively trying to train or fine-tune a model/agent, we could probably turn this on for that specific agent.search=yes: allowing agents to search, return text, and return links seems like a no-brainer. this is also the isitagentready.com default.ai-input=yes: this is the only default i changed. explicitly allowing agents to use our docs for RAG or summaries also seems beneficial for our developers.I'm certainly not trying to play the expert about this. I'm very interested if this set of signals seems to make sense, or if there are some modifications we should make.