docs(agent): default config as a build-kit overlay (inspect_context)#4917
docs(agent): default config as a build-kit overlay (inspect_context)#4917mmabrouk wants to merge 6 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughFour new documentation files are added under ChangesDefault Agent Config Playground Build Kit Docs
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🤖 The AI agent says: Superseded. This summary described the earlier approach, which materialized the defaults into the catalog template so a new agent committed them. That approach was dropped. The design now uses an inject-not-commit Playground build kit. See the current summary below. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
docs/design/agent-workflows/projects/default-agent-config/design.md (2)
156-164: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winSpecify the exact tools control file for frontend changes.
The design references
SkillTemplateControl.tsxprecisely but says "the tools control in the same folder" without a filename. For implementers, identify the exact file (e.g.,ToolTemplateControl.tsxor similar) or note that it needs to be located.
72-75: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd language specifier to fenced code block.
The markdownlint warning indicates the code block at line 72 lacks a language specifier. Add
pythonafter the opening backticks since this is Python code frominterfaces.py:537.📝 Proposed fix
-# interfaces.py:537
+```python
+# interfaces.py:537
default=build_agent_v0_default() # no skill, no tools
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 0172b05c-9bc8-4cc5-9964-dafd842fc251
📒 Files selected for processing (4)
docs/design/agent-workflows/projects/default-agent-config/README.mddocs/design/agent-workflows/projects/default-agent-config/design.mddocs/design/agent-workflows/projects/default-agent-config/research.mddocs/design/agent-workflows/projects/default-agent-config/status.md
mmabrouk
left a comment
There was a problem hiding this comment.
does not work in the current state. provided feedback to rewrite using the new pattern
|
🤖 The AI agent says: @coderabbitai both nitpicks landed on the previous
A re-review of the new revision is welcome. |
|
🤖 The AI agent says: The rewrite is pushed. This responds to @mmabrouk's review ("rewrite using the new pattern"). The design pivoted from materialize-into-the-catalog to inject-not-commit. The model. The platform tools, the Agenta authoring skill, and the build permissions are a Playground build kit. The backend injects the kit into the playground session for display and for the run. The commit never writes it. The published agent ships bare, and a production run never injects the kit. New backend surface.
What this dropped. The earlier materialize approach is gone, along with its base-plus-patch merge, delete markers, and suppressed-defaults list. Nothing about the kit is persisted into the agent. Two open questions for you:
Full design in |
|
I’ll re-review the current revision with the updated scope in mind, especially that the frontend control now belongs to the advanced-build-kit drawer project and that the drawer reads the backend (•ㅅ•) ✏️ Learnings added
✅ Action performedFull review finished. |
|
|
||
| - Display: the backend serves a build-kit descriptor. The drawer reads it. The set is | ||
| decided server-side. | ||
| - Run: the agent service injects the kit into the effective config at run-prep, in `_agent` |
There was a problem hiding this comment.
Nope, I don't think this is the right way to do it. It should not be the agent service that injects the kit. This should be a front-end matter. Basically, it's the front-end that injects the kit when used in the playground as part of the parameters.
The agent template. The service itself should not know about it. It's just okay, I'm getting an agent template, and I should run it. It doesn't matter for it, and it's the front end that decides. If these things are activated, then I like if this playground build kit is activated, then I should inject xyz, like the following skills and the following tools, and change the following permissions.
There was a problem hiding this comment.
🤖 The AI agent says: Adopted. This rewrite removes service-side injection entirely. The agent service now runs parameters.agent exactly as received and does not know the kit exists. There is no run-prep injection and no inject_build_kit flag anymore. To stop anyone re-adding it, the doc has an explicit "What the backend explicitly does NOT do" section.
The injection moves to the frontend. When the kit toggle is on, the frontend merges the kit's entries into parameters.agent in the run payload, at buildAgentRequest. If the platform tools, the skill, and the build permissions reach a run, it is only because the frontend already put them in the template the service was handed. See the "The corrected model" and "The frontend logic" sections.
| `resolve_tools(agent_template.tools)` at line 227. Gated by the run flag. The injection is | ||
| harness-agnostic, so it works on `pi_core` as well as `pi_agenta`. | ||
|
|
||
| The frontend never decides the set. It reads it for display and sends the toggle for the |
There was a problem hiding this comment.
Basically, what I'm saying is I think the frontend should own the business logic for this, or setting these. The only thing that the backend should do is give information and inspect what this build kit is like: which skills, which tools, and which permissions.
There was a problem hiding this comment.
🤖 The AI agent says: This is now the core model. The frontend owns all the business logic: it reads the kit, toggles it in the drawer, injects it on a kit-on run, and excludes it on commit. The backend's only build-kit job is to inform. It assembles a read-only build_kit descriptor (which skills, which tools, which permissions) and serves it on /inspect. It never acts on the kit. See "The corrected model" and the "The backend informs" / "The frontend owns the logic" bullets.
|
|
||
| The per-run flag the drawer toggle controls: | ||
|
|
||
| - `flags.inject_build_kit` (boolean), on the run request, request-scoped. The drawer's |
There was a problem hiding this comment.
And now again I think we should put that buildkit information in the inspect, and then let the frontend deal with it and have the logic there. The question is actually how do we put it in the inspect? How do we put that information? I think the design doc needs to be edited to explain how to, and obviously we use our skill for organizing information.
There was a problem hiding this comment.
🤖 The AI agent says: Done. There is a new section, "The /inspect build-kit descriptor (the central question)," that answers how end to end, organized with the design-interfaces skill:
- Where it sits:
revision.data.build_kit, a read-only sibling ofrevision.data.schemas. The frontend already fetches/inspectper workflow, so the drawer reads it with no new request. - The shape: grouped by kind (
skills,tools,permissions). Each row carrieskey(identity),nameanddescription(display), andconfig(the exactparameters.agententry the row contributes); permission rows also carry a read-onlystatus. - Why each row carries
config: the frontend, not the backend, now writes the entry intoparameters.agent, so the row must carry that entry. The frontend does a pure structural merge and owns no wire shapes. - The role of each field: walked field by field through the data / display / routing / identity roles, with no field playing two roles.
- How the backend assembles and serves it: one builder in the inspect layer reads
PLATFORM_OPS, the authoring-skill constant, and the build-permission set, then attaches the descriptor besideschemasin the/inspectresponse.
mmabrouk
left a comment
There was a problem hiding this comment.
As the proposed solution does not work, please review my comments and edit again the whole document.
|
🤖 The AI agent says: Pushed iteration 2, which flips the model to match your review. The model flip. The first version had the agent service inject the kit at run time behind a The descriptor, with The UI drawer, folded in. The advanced drawer design lives in this doc, not a separate PR. It makes the advanced sections collapsible and adds a top "Playground build kit" section rendered straight from Five open questions for you:
Inline replies are on your three comments. Ready for your review. |
|
|
||
| ### Where it sits | ||
|
|
||
| Deliver the descriptor in the `/inspect` response at `revision.data.build_kit`, a read-only |
There was a problem hiding this comment.
maybe something more general that would allow us to add more meta data / pointers? ask codex for some naming advice, the obvious one in meta but or quarks or something, but then really ask it to look at the whole inspect response schema and really find where we could do this in a way that is more scalable over the long time, where we could add other information on the build kit. Right now it's kind of just added a field there. Please read the skill about how to create a scale schema. We have a skill for that, so also use it to inform what to put, because that's bad design right now. You're hardcoding this information.
There was a problem hiding this comment.
🤖 The AI agent says: Consulted Codex on the whole inspect schema. Adopted its recommendation: a dedicated read-only inspect_context container on the response envelope, a sibling of application, with build_kit as its first member, so future read-only info is added beside it rather than overloading a user-owned field. Rejected meta (it collides with the user-owned artifact meta) and revision.data (it is extra="forbid" and flows into commit). See "Where the overlay lives in the inspect response."
| "key": "__ag__getting_started_with_agenta", // the skill slug (identity) | ||
| "name": "agenta-authoring", // display label | ||
| "description": "Scaffold and edit this agent's tools, skills, and config.", | ||
| "config": { "@ag.embed": { "slug": "__ag__getting_started_with_agenta" } } |
There was a problem hiding this comment.
Why do we need again the name and the description and the key? Isn't that all under embed? Isn't that like the whole idea that the skill, as it is, would be embedded? For me it would be kind of just ag.embed. You don't need anything else. Why do you need the other stuff? I really don't understand that, so probably you need just the version, etc., like the logic we use for embedding as always. Make the schema for embedding variants. How would you embed this on this kind of platform at scale? It doesn't even look like how you would do it.
There was a problem hiding this comment.
🤖 The AI agent says: Agreed. The skill is now just an @ag.embed reference (the slug, plus a version when pinned), the platform's existing variant-embedding mechanism. The embed reference already identifies the skill, which carries its own name and description, so the parallel key, name, and description are gone, along with the per-row config and status. The drawer renders each item with the existing config-item controls. See "The overlay shape."
| ], | ||
| "permissions": [ | ||
| { | ||
| "key": "write_files", |
There was a problem hiding this comment.
Again, here I really think we should not reinvent the wheel. Maybe this is the one where we can build it if we need to do something, but I would think about it as an overlay, an overwrite overlay.
You think about the agent template, the whole configuration there, and then you find the things that you want to edit, the things that you want to override, for example in the sandbox permission, in the cloud permission, whatever. You write kind of an overlay of that. It is easy to apply on the frontend side, where you just take the current agent configuration before sending the data into the backend and then you apply that overlay so that you overwrite some of the parameters. You really define it that way here.
There was a problem hiding this comment.
🤖 The AI agent says: Adopted as the core model. The kit is an agent-template overlay: a partial parameters.agent. On a kit-on run the frontend deep-merges object fields and identity-merges list fields onto a throwaway copy of the config, and on commit it merges nothing, so the overlay never enters the committed tree. See "The model: a build-kit overlay" and "Applying the overlay."
| } | ||
| ``` | ||
|
|
||
| ### Why each row carries `config` |
There was a problem hiding this comment.
Okay, as I always tell you, we edited the design. No need to talk about the pivot in the state of the document. The document is its own state right now. If you want to talk about something in the past, to pivot or whatever, you can add at the end some kind of notes saying, "Okay, previously we tried this and this, and we decided that." Don't make the core part of the design docs about the changes and the design box that goes into comments that go into the appendix at the end. The core is about the current state of things, so saying why each row carries config, I don't give a shit about that from the perspective of what has changed. I care about it as in it explains the current behavior, and that's all. Basically, keep your thinking tracks outside of the document. And please add a memory for this.
There was a problem hiding this comment.
🤖 The AI agent says: Done. The body is current-state only; the two dropped approaches are a short "Appendix: prior approaches" at the end. The memory was added.
|
|
||
| - The committed Permissions group, and the Execution environment group, configure the agent's | ||
| own permissions and sandbox. The user edits them, they commit, they ship. | ||
| - The build kit's PERMISSIONS group is a read-only reflection of the permissions the kit grants |
There was a problem hiding this comment.
Yeah, I've explained this part before. The solution should be general, and I think the best solution is an overlay solution. You have an overlay that you override the configuration with, and then it doesn't matter. It becomes a backend issue, and it's not a frontend issue. Frontend is just that it takes this thing, overlaid, and it could even have stuff that has nothing to do with the permission. It's not the problem. It's really the way you think about it as an agent template overlay.
I mean, you can think of the whole thing as an agent template overlay, and I think that's the best way to think about it, not as a build kit but kind of a build kit overlay. It simplifies the logic of dealing with it from the frontend to just okay.
There was a problem hiding this comment.
🤖 The AI agent says: Adopted. The kit is one agent-template overlay, not a bespoke build-kit shape. The backend declares the overlay and the frontend just applies it, so the overlay can carry anything the agent template defines (permissions or otherwise) and the logic stays simple on both sides. See "The model: a build-kit overlay" and "Applying the overlay."
|
🤖 The AI agent says: Pushed iteration 3, re-derived from first principles per your inline review. What changed
Open questions for you
|
| "tools": [ | ||
| { "type": "platform", "op": "find_capabilities" }, | ||
| { "type": "platform", "op": "query_workflows" }, | ||
| { "type": "platform", "op": "commit_revision" } |
There was a problem hiding this comment.
we need an example of an embed tool like we discused elsewhere for things that require frontend (client ones)
There was a problem hiding this comment.
🤖 The AI agent says: Added. The overlay's tools now carries { "type": "reference", "slug": "__ag__request_connection" }, a non-runnable reference (workflow-as-tool) entry embedded from a reserved slug, not a platform op. The frontend handles the call; #4920 owns its primary definition. A reader now sees a client/embed tool sitting beside the platform ops in the same tools array.
| ``` | ||
|
|
||
| The backend assembles this from sources the SDK already owns: `tools` by iterating | ||
| `PLATFORM_OPS` (`/home/mahmoud/code/agenta/sdks/python/agenta/sdk/agents/platform/op_catalog.py`), |
There was a problem hiding this comment.
some tools are platform workflows like platform skills that are in the reserved slugs so we need to also have a way to iterate to add them
There was a problem hiding this comment.
🤖 The AI agent says: Done. The assembly now runs two parallel iterations. It walks PLATFORM_OPS for the platform ops, and it enumerates the reserved-slug platform workflows from the static workflow catalog (StaticWorkflowCatalog over _STATIC_WORKFLOWS) and adds each as a reference-tool entry. Those reserved workflows are the same __ag__* family as the platform skills, so a client tool like __ag__request_connection joins the kit through the same iteration, no bespoke path.
| { | ||
| "count": 1, | ||
| "application": { /* ... the agent, including data.parameters (the user's config) ... */ }, | ||
| "inspect_context": { |
There was a problem hiding this comment.
I don't like inspect_context here, maybe additional_context or something better
There was a problem hiding this comment.
🤖 The AI agent says: Renamed to additional_context across design.md, README.md, status.md, the JSON examples, and the PR description. The path is now additional_context.playground_build_kit.agent_template_overlay.
| "count": 1, | ||
| "application": { /* ... the agent, including data.parameters (the user's config) ... */ }, | ||
| "inspect_context": { | ||
| "build_kit": { |
There was a problem hiding this comment.
maybe a better descriptive name
There was a problem hiding this comment.
🤖 The AI agent says: Renamed build_kit to playground_build_kit for explicitness. It now reads additional_context.playground_build_kit.agent_template_overlay, which ties the container to the playground where the kit is shown and used but never committed.
…ample, reserved-slug iteration) Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
…override tooltip Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
|
🤖 The AI agent says: Targeted fix round on this design (commit Three corrections
{ "@ag.embed": { "@ag.references": { "workflow": { "slug": "__ag__request_connection" } } } }The embed resolver inlines it into a
Folded decisions (moved out of open questions into a Decided section)
One open question remains: confirm the published default goes bare (touches the skills project). |
| { "type": "platform", "op": "query_workflows" }, | ||
| { "type": "platform", "op": "commit_revision" }, | ||
| // a client tool the embed resolver inlines: an @ag.embed of a reserved-slug workflow, the same shape as a skill embed (#4920) | ||
| { "@ag.embed": { "@ag.references": { "workflow": { "slug": "__ag__request_connection" } } } } |
There was a problem hiding this comment.
🤖 The AI agent says: This is the load-bearing fix. The client tool now embeds via @ag.embed (was { "type": "reference", "slug": ... }), identical to the skill embed two lines down, only the slug differs. Verified against _ToolEmbedRefSchema / _SkillEmbedRefSchema in sdks/python/agenta/sdk/utils/types.py.
| because the two ideas sit close together and a user who conflates them could believe the agent | ||
| ships with write-files and execute-code permission. | ||
|
|
||
| ### Show an override hint on an overridden user control |
There was a problem hiding this comment.
🤖 The AI agent says: New override-hint requirement (Q3). When the kit overrides a user setting, the user`'s own drawer control flags it with a tooltip: the value is overridden in the playground by the build kit, toggle the kit off to match the published agent. The concrete case is a permission the user disabled that the kit re-enables. The merge-precedence discussion ("Applying the overlay") references this.
What this designs
A new agent on Agenta arrives near-bare. While the user builds it in the playground, the assistant needs authoring scaffolding: the platform tools (find capabilities, query workflows, commit a revision), the Agenta authoring skill, and elevated sandbox permissions (write files, execute code). That scaffolding is a build aid, not the user's agent. So it is shown and used in the playground, never committed, and never reaches a deployed run.
This doc decides who shows that scaffolding, who applies it, and who keeps it out of the commit.
Iteration 3: re-derived from first principles as an overlay
This is the third pass. The first had the agent service inject the kit at run time behind a run flag. The second moved the logic to the frontend but modeled the kit as a bespoke descriptor with typed groups and per-row display fields. This pass re-derives the whole thing from scratch as a single overlay, the model Mahmoud asked for in review.
The model: a build-kit overlay
The build kit is an agent-template overlay. It is a partial agent template, the same shape as
parameters.agent. Three actors, three jobs:parameters.agentit receives and does not know the overlay exists. There is no run-prep merge and no run flag. The platform tools, the skill, and the build permissions reach a run only because the frontend already merged them into the template the service was handed.parameters.agent. On commit, it does not merge, so the committed config holds only the user's own template."Shown but not committed" follows from this. The overlay never enters
parameters.agent, the tree the playground edits and commits, so there is no strip step and a deployed agent can never run with it.Where the overlay lives in the inspect response
The overlay rides a dedicated read-only container,
additional_context, a sibling ofapplicationon the inspect envelope:{ "application": { /* the agent, including data.parameters (user config) */ }, "additional_context": { "playground_build_kit": { "agent_template_overlay": { /* the partial parameters.agent */ } } } }The placement follows ownership and lifecycle. User config the revision persists lives in
application.data.parameters. User metadata that round-trips lives inapplication.meta. Platform information the backend derives read-only for one response lives inadditional_context. The overlay is rejected fromrevision.data(it isextra="forbid"and flows into commit) and fromapplication.meta(user-owned and persisted). This placement was chosen after consulting Codex on the whole inspect schema, so future read-only hints add members besideplayground_build_kitrather than overloading a user-owned field.Skills and the client tool are embed references
A skill in the overlay is an ordinary
@ag.embedreference, the platform's existing mechanism for embedding a variant. The reference (the slug, plus a version when pinned) already identifies the skill, which carries its own name and description. The overlay adds no parallelkey,name, ordescription. The hard-coded client tool__ag__request_connectionembeds the same way, through the identical@ag.embedshape (only the slug differs); the embed resolver inlines it into aclienttool the frontend handles. The drawer renders each item with the existing config-item controls.The drawer UI, folded in
The inspect contract and the advanced-drawer design live in one doc. The drawer recomposes existing parts:
Removed on committag, an enable toggle, and read-only rows that reuse the existing read-only skill pattern.The toggle is session state (
buildKitEnabled, default on). It sets no run flag and writes nothing into the config. The doc flags the one real UI risk: the drawer now holds two ideas that both say "permissions" (the committed agent permissions and the kit's read-only build permissions), and they must not read as one setting. When the kit is on and overrides one of the user's own settings, the user's own control shows an override hint: the value is overridden in the playground by the build kit, and toggling the kit off makes the playground match the published agent. The load-bearing case is a permission the user disabled in their config that the kit re-enables for the playground run.Scope and risk
Doc only, no code. The authoring skill content is owned by the skills project (#4918). The builder-tools project (#4919) adds more platform ops, which the overlay builder picks up from
PLATFORM_OPSautomatically. Out of scope: per-item edit or delete of kit items, and a picker to add platform tools to the published agent.The body is current-state only. The two earlier models are recorded in a short appendix, not in the design narrative. The overlay model ripples to the sibling docs (#4918, #4919, #4920); the orchestrator propagates it after this design is approved.
Decided this round
Open questions for Mahmoud
Related PRs
Part of the "agent builds an app" initiative. Read the map first: #4921.
https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc