docs(agent): agent-creation skills + custom-tools design note#4863
docs(agent): agent-creation skills + custom-tools design note#4863mmabrouk wants to merge 2 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR adds documentation for agent creation workflows, a create-and-invoke script, runtime and tool references, a custom-tools design note, and self-hosting guidance with companion reference material. ChangesAgent creation workflow docs
Custom tool surface design
Self-hosting docs
🎯 3 (Moderate) | ⏱️ ~30 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review This is a docs-only draft (skills + a design note). Please focus on: (1) the agent-creation API calls in the create-agenta-agent skill and reference are internally consistent and correct, (2) the custom-tools design note's tool boundaries make sense (self-scoped update, recursion/budget guards), (3) any factual drift in the self-host trust model. Behavior changes / implementation are out of scope for this PR. |
🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== Locate relevant files =="
fd -p 'README.md|build-notes.md|custom-tools-design.md|SKILL.md|reference.md|create_agent.py' docs/design/agent-workflows/projects/agent-creation-skills || true
echo
echo "== Tree =="
tree -a -L 4 docs/design/agent-workflows/projects/agent-creation-skills || true
echo
echo "== Read top-level docs =="
for f in \
docs/design/agent-workflows/projects/agent-creation-skills/README.md \
docs/design/agent-workflows/projects/agent-creation-skills/build-notes.md \
docs/design/agent-workflows/projects/agent-creation-skills/custom-tools-design.md
do
echo
echo "----- FILE: $f -----"
sed -n '1,260p' "$f"
done
echo
echo "== Read create-agenta-agent skill docs =="
for f in \
docs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/SKILL.md \
docs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/reference.md \
docs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/create_agent.py
do
echo
echo "----- FILE: $f -----"
sed -n '1,320p' "$f"
done
echo
echo "== Read self-host-agenta skill docs =="
for f in \
docs/design/agent-workflows/projects/agent-creation-skills/skills/self-host-agenta/SKILL.md \
docs/design/agent-workflows/projects/agent-creation-skills/skills/self-host-agenta/reference.md
do
echo
echo "----- FILE: $f -----"
sed -n '1,320p' "$f"
done
echo
echo "== Search for key terms across project =="
rg -n -C2 --glob '*.md' --glob '*.py' \
'parameters\.agent|ag_config|data\.uri|data\.url|workflow_variant|workflow_id|variant|artifact_id|invoke_workflow|update_own_workflow|update_workflow|recursion|budget|trust|sandbox-agent|Daytona|OAuth|subscription|Composio|search_tools|secrets|archive' \
docs/design/agent-workflows/projects/agent-creation-skillsLength of output: 149378
Overall, this is in good shape. The create-agent flow is internally consistent across
On the design side, I think the tool boundaries are sensible:
I only found two non-blocking clarifications worth tightening:
No blocking factual drift jumped out to me beyond those wording clarifications. ✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/create_agent.py (1)
32-163: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winLGTM with two notes.
The script correctly implements the full loop with proper payload shapes. Two issues to address:
Resource leak on partial failure: If an intermediate step fails (e.g., variant creation fails after workflow creation), the created workflow is orphaned because
archiveonly runs on full success. Consider atry/exceptorcontextlib.ExitStackpattern to track created resources and clean up on failure.Coding guidelines compliance: As per coding guidelines, Python files require
ruff formatthenruff check --fixbefore committing.Source: Coding guidelines
♻️ Duplicate comments (15)
docs/design/agent-workflows/projects/agent-creation-skills/README.md (4)
68-79: 📐 Maintainability & Code Quality | 🟡 Minor | 💤 Low valueAdd
jsonlanguage specifier to fenced code block.+```json
POST /api/workflows/variants/
{
"workflow_variant": {<!-- cr-comment:v1:713a3b15ad64ef925f2c1bb8 --> --- `86-109`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `json` language specifier to fenced code block.** ```diff+```json
POST /api/workflows/revisions/commit
{
"workflow_revision": {<!-- cr-comment:v1:e33e495f0c7bc1b337bb0035 --> --- `113-123`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `json` language specifier to fenced code block.** ```diff+```json
POST /services/agent/v0/invoke
Accept: application/json<!-- cr-comment:v1:98805001850b3f983af97258 --> --- `205-215`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `json` language specifier to fenced code block.** ```diff+```json
POST /api/vault/v1/secrets/
{
"header": { "name": "OpenAI", "description": "..." },<!-- cr-comment:v1:74bbfa0f35c69c4562b9e731 --> </blockquote></details> <details> <summary>docs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/SKILL.md (6)</summary><blockquote> `66-72`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `bash` language specifier to fenced code block.** ```diff+```bash
curl -sS -X POST "$HOST/api/workflows/variants/?project_id=$PROJECT" \<!-- cr-comment:v1:839692a7dc64580454e44033 --> --- `80-97`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `bash` language specifier to fenced code block.** ```diff+```bash
curl -sS -X POST "$HOST/api/workflows/revisions/commit?project_id=$PROJECT" \<!-- cr-comment:v1:2320f8cbd23221c5729a2b50 --> --- `121-129`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `bash` language specifier to fenced code block.** ```diff+```bash
curl -sS -X POST "$HOST/services/agent/v0/invoke?project_id=$PROJECT" \<!-- cr-comment:v1:49aad829ed09769b59746a1a --> --- `145-151`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `bash` language specifier to fenced code block.** ```diff+```bash
curl -sS -X POST "$HOST/api/vault/v1/secrets/?project_id=$PROJECT" \<!-- cr-comment:v1:98f92131d1feb402e7f2466b --> --- `159-165`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `bash` language specifier to fenced code block.** ```diff+```bash
curl -sS -X POST "$HOST/api/tools/connections/?project_id=$PROJECT" \<!-- cr-comment:v1:5b6489e7244d5eaa29894f06 --> --- `182-189`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `bash` language specifier to fenced code block.** ```diff+```bash
curl -sS -X POST "$HOST/api/workflows/revisions/commit?project_id=$PROJECT" \<!-- cr-comment:v1:aa65c142cd14306800891ba8 --> </blockquote></details> <details> <summary>docs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/reference.md (5)</summary><blockquote> `49-68`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `json` language specifier to fenced code block.** ```diff+```json
// builtin: a harness-native tool (read, write, bash, ...)<!-- cr-comment:v1:ce5fdc3fc7613f114e5a1d5a --> --- `74-79`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `json` language specifier to fenced code block.** ```diff+```json
{ "name": "filesystem", "transport": "stdio",<!-- cr-comment:v1:fbbe7e64a60c1cdf777eca2f --> --- `88-93`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `json` language specifier to fenced code block.** ```diff+```json
{ "name": "release-notes",<!-- cr-comment:v1:447e0179861f9ce2b5d54962 --> --- `154-158`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `json` language specifier to fenced code block.** ```diff+```json
{ "header": { "name": "OpenAI", "description": "..." },<!-- cr-comment:v1:4251f63a96205c73e4654ca9 --> --- `165-171`: _📐 Maintainability & Code Quality_ | _🟡 Minor_ | _💤 Low value_ **Add `json` language specifier to fenced code block.** ```diff+```json
{ "header": { "name": "MyAzure" },<!-- cr-comment:v1:e5ce91a5e7c8a272fe8198eb --> </blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (4)</summary><blockquote> <details> <summary>docs/design/agent-workflows/projects/agent-creation-skills/custom-tools-design.md (1)</summary><blockquote> `77-79`: _📐 Maintainability & Code Quality_ | _🔵 Trivial_ | _⚡ Quick win_ **Clarify how `invoke_workflow` resolves ambiguous workflow refs.** The current description says "resolve the revision if a ref is given" without specifying the resolution rule when multiple branches or commits exist. To make the design unambiguous, specify whether resolution uses: - the latest committed revision across all variants, - the default variant's latest revision, or - an explicit variant + revision selector. This was previously identified as a needed clarification and remains unaddressed in the current draft. <!-- cr-comment:v1:7c908fb2da356be772d6c2ea --> </blockquote></details> <details> <summary>docs/design/agent-workflows/projects/agent-creation-skills/skills/self-host-agenta/SKILL.md (1)</summary><blockquote> `63-65`: _📐 Maintainability & Code Quality_ | _🔵 Trivial_ | _💤 Low value_ **Add a language tag to the fenced code block.** The env-var snippet at line 63 triggers markdownlint `MD040`. Tag it as `bash` or `shell` so the linter stays clean. ```diff -``` +```bash AGENTA_AGENT_RUNNER_URL=http://sandbox-agent:8765 # services -> sidecar over the compose netdocs/design/agent-workflows/projects/agent-creation-skills/skills/self-host-agenta/reference.md (2)
41-42: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winClarify "backend" to avoid conflating the abstraction with the sidecar container.
This sentence is technically correct in code terms (
select_backendalways returnsSandboxAgentBackend), but readers may read "backend" as "the sidecar container" and wonder how that squares with the local-spawn path that has no sidecar. Rephrase to make clear that the abstraction is constant while the transport (HTTP to sidecar vs local CLI spawn) varies.-The backend is always the sandbox-agent backend; only the transport (`RUNNER_URL`) and the -sandbox axis vary. +The backend abstraction is always `SandboxAgentBackend`; only the transport (`RUNNER_URL`) and +the sandbox axis vary. When `RUNNER_URL` is set, `services` POSTs to the sidecar container. +When unset, the backend spawns the runner CLI locally from `AGENTA_AGENT_RUNNER_DIR`.
10-16: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueOptionally tag the ASCII diagram as
textto silence markdownlint.The topology block triggers
MD040. Addingtextsatisfies the linter without affecting rendering.-``` +```text client -> traefik -> api (/api/...) management: workflows, vault, tools
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 1f1b944e-c2f4-4504-9c15-22056a6754ce
📒 Files selected for processing (8)
docs/design/agent-workflows/projects/agent-creation-skills/README.mddocs/design/agent-workflows/projects/agent-creation-skills/build-notes.mddocs/design/agent-workflows/projects/agent-creation-skills/custom-tools-design.mddocs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/SKILL.mddocs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/create_agent.pydocs/design/agent-workflows/projects/agent-creation-skills/skills/create-agenta-agent/reference.mddocs/design/agent-workflows/projects/agent-creation-skills/skills/self-host-agenta/SKILL.mddocs/design/agent-workflows/projects/agent-creation-skills/skills/self-host-agenta/reference.md
| ``` | ||
| POST /api/workflows/ | ||
| { | ||
| "workflow": { | ||
| "slug": "my-agent-<unique>", | ||
| "name": "My Agent", | ||
| "description": "Agent via API", | ||
| "flags": { "is_custom": false } | ||
| } | ||
| } | ||
| -> 200 { "workflow": { "id": "<workflow_id>", "slug": "...", ... } } | ||
| ``` |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Add json language specifier to fenced code block.
+```json
POST /api/workflows/
{
"workflow": {
Improves syntax highlighting and readability. Same applies to the other JSON blocks below.
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 50-50: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<!-- cr-indicator-types:potential_issue -->
<!-- cr-comment:v1:b9250bb7c234a4add284ab54 -->
<!-- This is an auto-generated comment by CodeRabbit -->
77be3c3 to
e87eb9a
Compare
|
|
||
| - Inputs: `name`, `slug?`, `agent_config` (the `parameters.agent` object), `variant_slug?`, | ||
| `message?`. | ||
| - Behavior: `POST /api/workflows/` -> `POST /api/workflows/variants/` -> |
There was a problem hiding this comment.
I mean, this would be in the tools in the API. It does not make sense to have calls between two API endpoints. We would be doing it with direct service calls, right?
| optional `stream`. | ||
| - Behavior: resolve the revision if a ref is given, then `POST /services/agent/v0/invoke`. | ||
| Returns the assistant output and the `trace_id`. | ||
| - Guardrail: depth/recursion limit so a builder agent cannot fork-bomb itself; a per-run |
There was a problem hiding this comment.
I wonder how you plan to do that?
| Returns the assistant output and the `trace_id`. | ||
| - Guardrail: depth/recursion limit so a builder agent cannot fork-bomb itself; a per-run | ||
| budget cap. Cheap-model default for self-tests. | ||
| - Permission: gateable; read-ish but it spends model budget. |
There was a problem hiding this comment.
I mean, here it would make sense also to be able to spin off yourself a subagent, although that happens sometimes in the harness itself. Nothing actionable here. We're just going to have to keep in mind.
| test what it created. | ||
|
|
||
| - Inputs: `workflow_ref` (id or slug) or inline `agent_config`, plus `messages` / `inputs`, | ||
| optional `stream`. |
There was a problem hiding this comment.
What happens if we invoke with streaming? Will we be able to stream the results of the sub-agent from the agent's response, or is this just irrelevant?
| Run a workflow (by id/slug, or an inline config) and get the output. Lets a builder agent | ||
| test what it created. | ||
|
|
||
| - Inputs: `workflow_ref` (id or slug) or inline `agent_config`, plus `messages` / `inputs`, |
There was a problem hiding this comment.
It's very nice that we'll be able to have a line agent config. This means you can spin off a version of yourself, per se, and, for example, test something: testing a skill, changing to a skill and how that affects you. Probably that's something that makes sense to add to the thing that you want to have in skills.
| the running agent's own workflow so it cannot rewrite arbitrary agents. | ||
|
|
||
| - Inputs: `agent_config` (the new config), `message`. | ||
| - Behavior: resolve the running agent's own `workflow_variant_id` from the run context, then |
There was a problem hiding this comment.
This means that the agent should have access to some run context that is propagated somehow, no? How does it get access to this run context? Does it maybe have a tool, kind of a built-in tool, to get run context? Like something implemented on the sidecar or in the SDK? Because probably, for example in this case, the app.py will provide that context to the agent, like to the sidecar, but then that would not be injected as is into agent.md or something. It would be kind of as a tool, and then, if that tool is called, the question is where the boundary lies. Do you return to the app.py to get that information? You see, okay, it's actually part of the sidecar, and it will kind of return it. I don't know if that's the case.
What is this context in the case of running this from the SDK, like with a local backend? Right, we would have another route where, instead of running this in the sidecar, you would pull the configuration and then you would use the SDK to run this. Obviously, here the SDK will do the job of app.py. It will, I don't know, if there's a variant, etc., a trace ID, whatever. It will put it in context, and then it will kind of be the same as the sidecar setup, the tools, whatever, in the way that makes sense. In this case, through an MCP or whatever that way that makes sense, and then I guess that's where it would also add a tool to have the context and to inject it. I guess in that case, what would be the type of the tool? I mean, we didn't think a lot about the local backend. How would that work, how the tool injection would work? I mean, with close code, probably the way is too, if it's something, an endpoint like a gateway tool, that's not a problem, right? We need to have an MCP, and that MCP kind of has that call. If it's a code tool, we can also put it in MCP. We don't care. In the case of Pi, we would put it in the folder, like all these tools.
I have big questions, like how do you provide context? Obviously, you could have an endpoint, but then that would be a little bit stupid, so if you have better ideas, please let me know.
|
There seems to be a misunderstanding. These skills are supposed to be the skills within Agenta and not the skills used to develop Agenta, which are in the claude folder or whatever. These skills would be the skills within custom workflow, as in constant custom workflows. |
e87eb9a to
cbff4ef
Compare
|
🤖 The AI agent says: Added a SUPERSEDED banner to |
…l (Codex) interfaces.md: the concrete interface delta (call descriptor, runContext, type:platform config, platform-op catalog entry, reference config, dispatch body-assembly) with the file:line seams, so review + implementation share one reference. plan.md: the endpoint-wrapper set to expose (from #4863, reframed as existing endpoints; plain + self/bind variants incl. the annotate pair) + catalog implementation per the Codex catalog review (typed catalog in SDK platform/op_catalog.py, reuse PlatformConnection, stable tools.agenta.* namespace, per-op permission defaults; execution stays direct, find_capabilities migrates to direct).
Add a draft workspace for skills that let a person or an agent harness create agents on Agenta over the API, plus a design note for the gateway tools that would let a harness build agents. - create-agenta-agent skill (SKILL.md + reference.md + runnable create_agent.py): discover tools, the agent config schema, the create/variant/commit/invoke loop, harness selection, secrets. Every call verified live on the dev stack. - self-host-agenta skill: run the stack with/without the sandbox-agent sidecar, Claude via API key vs subscription (OAuth) login, Daytona sandbox, trust model. - custom-tools-design.md: search_tools, create_workflow, invoke_workflow, update_own_workflow, add_trace_annotation + companions; design only. - README.md (verified API reference) + build-notes.md (decisions). Skills live under the project workspace as drafts; they move to .agents/skills/ when approved. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
The logic-wrapping tools framing (create_workflow/update_own_workflow/add_trace_annotation/...) was the rejected first version. Platform tools are a thin wrapper over existing endpoints; see projects/direct-call-tools/. Banner points readers to the current design.
85387a4 to
d40ea9c
Compare
…l (Codex) interfaces.md: the concrete interface delta (call descriptor, runContext, type:platform config, platform-op catalog entry, reference config, dispatch body-assembly) with the file:line seams, so review + implementation share one reference. plan.md: the endpoint-wrapper set to expose (from #4863, reframed as existing endpoints; plain + self/bind variants incl. the annotate pair) + catalog implementation per the Codex catalog review (typed catalog in SDK platform/op_catalog.py, reuse PlatformConnection, stable tools.agenta.* namespace, per-op permission defaults; execution stays direct, find_capabilities migrates to direct).
|
🤖 The AI agent says: Reconciling the custom-tools note here with what just shipped. Your line-67 point — no calls A platform tool is a thin wrapper over an existing Agenta endpoint, declared as So platform tools = the catalog. First three ops: |
What
DRAFT for morning review. A new workspace of skills that let a person, or an agent harness, create agents on Agenta over the API, plus a design note for the gateway tools that would let a harness build agents itself.
Lives under
docs/design/agent-workflows/projects/agent-creation-skills/. The skill files are drafts underskills/; when approved they move to.agents/skills/<name>/(symlinked into.claude/skills/), per repo convention. Nothing wired into runtime behavior.Why
We want users (and harnesses) to be able to build agents on Agenta without reverse-engineering the API. These skills capture the create -> commit -> invoke loop, the agent config schema, tool discovery, harness selection, and secrets, all verified against the live stack.
Contents
skills/create-agenta-agent/— SKILL.md (the procedure) + reference.md (full field tables, tools catalog, secrets schema, harness capability map) +create_agent.py(a runnableuv runscript that does the whole loop with--archivecleanup).skills/self-host-agenta/— SKILL.md + reference.md: run the stack with/without thesandbox-agentsidecar, Claude via API key vs subscription (OAuth) login, the Daytona sandbox, and the trust model.custom-tools-design.md— design note for the agent-self-creation tools:search_tools(Composio has ~1000 integrations, search is essential),create_workflow,invoke_workflow,update_own_workflow,add_trace_annotation, plus companions. Design only; each maps to a verified endpoint.README.md— the verified API reference.build-notes.md— decisions and judgment calls.Verified live
Every API call was run against the dev stack on 2026-06-26 (project
hotel-agent, cheap models). The full create workflow -> create variant -> commit revision (parameters.agent) -> invoke loop returned correct answers ("...Paris.", "...Tokyo."); update appended a version; secrets create/delete and tools search/resolve confirmed. The bundled script ran end to end and cleaned up after itself. Test workflows archived; fake secret deleted.Key corrections to circulating lore, documented as gotchas: the config payload key is
parameters.agent(notag_config); the builtin URI goes indata.uri(notdata.url, which validates as HTTP); the variant request wrapper isworkflow_variant/workflow_id(the old.httpfiles usevariant/artifact_idand now 422).Review asks
update_own_workflow(self-scoped) vs a generalupdate_workflow, and theinvoke_workflowrecursion/budget guard? See the "Open questions for review" in the design note..agents/skills/now, or stay drafts until the custom tools land?Design only / docs only. Not for merge yet.
https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc