docs(agent): direct-call tools design (Workstream A)#4886
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🔸 Decision needed — this is a design decision, not a code review. When do we remove the Reference tools execute today through
My recommendation is Option 1. Once you pick, the orchestrator can sequence the implementation (One lower-priority open item, no decision needed now: forwarding the trace context to the |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
docs/design/agent-workflows/projects/direct-call-tools/context.md (1)
41-42: 📐 Maintainability & Code Quality | 🔵 TrivialFix extra whitespace in "Agenta base".
The static analysis flagged extra spaces between "Agenta" and "base" on line 42. Clean up the spacing for readability.
- base. Off-Agenta targets are a deliberate later decision. + base. Off-Agenta targets are a deliberate later decision.Source: Linters/SAST tools
docs/design/agent-workflows/projects/direct-call-tools/design.md (2)
12-18: 📐 Maintainability & Code Quality | 🔵 TrivialAdd language identifier to fenced code block.
The
calldescriptor schema block lacks a language specifier. Addjsonortsfor proper syntax highlighting.+```json
call: {
method: "POST" | "GET",<!-- cr-comment:v1:346e88542f0bafce7105f59c --> _Source: Linters/SAST tools_ --- `125-128`: _📐 Maintainability & Code Quality_ | _🔵 Trivial_ **Add language identifier to fenced code block.** The platform-op catalog schema block lacks a language specifier. ```diff+```json
op -> { method, path, input_schema_ref, default_permission, default_needs_approval }Source: Linters/SAST tools
docs/design/agent-workflows/projects/direct-call-tools/status.md (1)
52-53: 📐 Maintainability & Code Quality | 🔵 TrivialExplain coordination jargon or link to conventions.
References to "Row claimed on
scratch/agent-coordination.md", "butwrite", and "BUT-LOCK" use team-specific jargon that may not be clear to all readers. Consider adding a brief explanatory note or linking to the team's coordination conventions documentation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 60ef4e5d-a6f7-4909-83e3-8001c1c671ea
📒 Files selected for processing (6)
docs/design/agent-workflows/projects/direct-call-tools/README.mddocs/design/agent-workflows/projects/direct-call-tools/context.mddocs/design/agent-workflows/projects/direct-call-tools/design.mddocs/design/agent-workflows/projects/direct-call-tools/plan.mddocs/design/agent-workflows/projects/direct-call-tools/research.mddocs/design/agent-workflows/projects/direct-call-tools/status.md
f607f03 to
1482e8a
Compare
|
🤖 The AI agent says: ## Round 2 — changes addressing your comments (for a skim) Second version pushed (design-only, no code). Each of your comments mapped to what changed, then 1. Merge / ownership ( 2. Permissions ( 3. Input/messages + output schemas ( 4. Catalog to mirror ( 5. CodeRabbit (schema sourcing) — Fixed Decisions to be aware of (the only net-new ones)
Everything else is alignment/cleanup. Nothing needs a decision from you — ready for your second |
mmabrouk
left a comment
There was a problem hiding this comment.
Round 2: inline pointers to each change, mapped to your comments (context is in the PR summary comment too).
| custom-tools design note (#4863) is waiting on the execution model. Settling the routing now | ||
| lets both land on the right shape instead of on `/tools/call` string-routing. | ||
|
|
||
| ## Merge, review, and ownership |
There was a problem hiding this comment.
🤖 The AI agent says: Per your context.md:50 comment. Added this section: you're aligned with all decisions, merge order/organization is the orchestrator's call, and reviewability before merge is the only requirement. I also reframed the old "Open decisions for Mahmoud" as "Sequencing notes (orchestrator's call)" in plan.md + status.md, so nothing is waiting on a decision from you.
| scope here. The requirement this design holds is narrower: permission handling stays consistent | ||
| across all tool types, so the refactor can lift it uniformly later. | ||
|
|
||
| ## The platform-op catalog |
There was a problem hiding this comment.
🤖 The AI agent says: Per your design.md:123 comment (researched). Changed from "mirror platform_catalog.py" to mirroring the reserved tools.agenta.* pattern from PR #4884 (find_capabilities) + the evaluators catalog shape, reusing CATALOG_TYPES. Confirmed platform_catalog.py is NOT being removed (your suspicion) — it's current but heavier (for skills), so we mirror its spirit not its machinery. find_capabilities is the first platform tool.
There was a problem hiding this comment.
I was actually thinking here of mirrorring patterns in the sdk
mmabrouk
left a comment
There was a problem hiding this comment.
🤖 The AI agent says: Run-context design added (Codex-reviewed). Two inline pointers below; the A-vs-C decision is a separate comment.
| | C hybrid | clean | +1 | strong | optional | strong | | ||
| | D propagate | trace-only | none | n/a | no | partial | | ||
|
|
||
| ## Recommendation (revised after the Codex review) |
There was a problem hiding this comment.
🤖 The AI agent says: New file answering "how does run context reach the auto-editing tools." I had it Codex-reviewed (xhigh). Headline: I recommend Option C (run-level runContext channel + server-side binding of protected fields), revised away from the static get_run_context tool (Option A) you leaned toward — Codex argues A is a boundary category-error and that B/C actually keep the sidecar's canonical-primitive set smaller. Security findings + a real gap (the annotations endpoint mints a new trace today, it doesn't attach to the active one) are in the "Security and guardrails" section. The A-vs-C call is yours — see the 🔸 Decision-needed comment.
| scope here. The requirement this design holds is narrower: permission handling stays consistent | ||
| across all tool types, so the refactor can lift it uniformly later. | ||
|
|
||
| ## Run context |
There was a problem hiding this comment.
🤖 The AI agent says: Added from the run-context work: this run-context pointer plus an SSRF guardrail on the call descriptor (method allowlist, relative-path regex, /api/ prefix, origin binding), which Codex flagged because the call makes the sidecar a structured HTTP dispatcher. Detail and options in run-context.md.
|
🤖 The AI agent says: 🔸 Decision needed — design decision, not a code review. How should run context (current trace, running workflow/variant + draft-state + latest revision, session_id) reach the auto-editing tools? Full options + tradeoffs in the new
My recommendation is C. Surfaced regardless of A/C: own-variant binding + a revision-precondition on commit, and that Once you pick, the orchestrator folds it into the implementation. Leaning C unless you say otherwise. |
mmabrouk
left a comment
There was a problem hiding this comment.
🤖 The AI agent says: Corrected the platform-tools framing per your note + re-ran Codex. Inline pointers below; the one decision is a separate comment.
| Status: options for review. Re-asked Codex with the corrected framing (see the bottom). Open | ||
| question; Mahmoud is weighing options. | ||
|
|
||
| ## What platform tools actually are (correcting a misframe) |
There was a problem hiding this comment.
🤖 The AI agent says: Rewrote this file after your correction. Platform tools are now framed as thin wrappers over existing endpoints — no new endpoints, no logic-wrapping tools, no update_own_workflow / add_trace_annotation. The real question is just how the harness gets its own context. I also marked #4863's custom-tools-design.md superseded, since the logic-tool names came from there.
| | 5 context file | yes | none | yes (rewrite) | strong (folders) | out-of-band | | ||
| | 6 system prompt | yes | none | weak | medium | simple but static | | ||
|
|
||
| ## Recommendation (after the Codex rev-2 review) |
There was a problem hiding this comment.
🤖 The AI agent says: Re-asked Codex (xhigh) with the corrected framing — verdict: pass with conditions. Headline: split run context by trust — server-bind the agent's own variant/trace into call.body (hidden from the model) for self-targeting ops; deliver general model-readable context as run-level data (MCP resource + file fallback, NOT a get_context canonical primitive); refresh per turn. The one decision for you is the 🔸 comment.
| "update myself" is just committing a revision to the agent's own variant. The raw endpoints are | ||
| exposed; a skill teaches composition. | ||
|
|
||
| For a **self-targeting** op, the agent's own identity (its variant / its trace) is **server-bound** |
There was a problem hiding this comment.
🤖 The AI agent says: The security reconciliation Codex flagged: for a self-targeting op the own variant/trace is server-bound (locked in call.body, hidden from the model), because project-scoped auth does NOT stop within-project lateral movement (a model retargeting another variant/trace in the same project). Still a thin wrapper, just a fixed field. This adjusts the "own is convention" framing — see the decision comment.
|
🤖 The AI agent says: 🔸 Decision needed — design decision, not a code review. Do we server-bind the agent's own identity for self-targeting platform tools, or treat "own" as a convention the model follows?
My recommendation: server-bind the self-identity; deliver general (non-authority) context the model needs to read as run-level data via an MCP resource (+ context-file fallback), not a Confirm server-bind, or say why convention is enough. |
mmabrouk
left a comment
There was a problem hiding this comment.
🤖 The AI agent says: Added the context-binding mechanics (the 'other part' of the design). Inline pointer + summary below.
|
|
||
| `callRef` stays for gateway. A spec has either `call` (direct) or `callRef` (gateway), not both. | ||
|
|
||
| ### Context binding (how a catalog entry fills a field from run context) |
There was a problem hiding this comment.
🤖 The AI agent says: This is the "how does the catalog encode the context transformation" mechanic you asked about. A catalog entry declares a bind map (endpoint field ← run-context key); the service strips bound fields from the model schema and emits call.context; the runner fills them at dispatch from a per-turn runContext blob on /run. Codex (xhigh) validated it with refinements, folded in: hybrid resolution (immutable identity baked at resolve time, mutable filled at dispatch), path-aware schema stripping, and merge hardening (path-conflict rejection, strict path parsing, prototype-pollution-safe deep-set, post-merge validation). runContext is its own /run field. Implementation checklist is in the subsection.
|
🤖 The AI agent says: Added the context-binding mechanics (the "other part" of the run-context design): how a platform-op catalog entry declares a Codex (xhigh) reviewed it — the mechanic is sound, and it correctly noted none of it is built yet (expected: this is a design PR, so that is the build plan, not a flaw). Folded its refinements: hybrid binding (immutable@resolve / mutable@dispatch), path-aware stripping, merge hardening (conflict rejection + safe deep-set + post-merge validation), and Still open — the one decision from the previous round: server-bind the self-identity vs "own is convention" (lean server-bind). Everything else in the design is settled. |
mmabrouk
left a comment
There was a problem hiding this comment.
🤖 The AI agent says: Recorded your decision and did the cleanup round. Inline pointer + summary below.
| Date: 2026-06-27 (rev 3 — DECIDED) | ||
| Status: **decided** — this is the design to implement. Options weighed and dropped are at the end. | ||
|
|
||
| ## Decision |
There was a problem hiding this comment.
🤖 The AI agent says: Recorded your decision and cleaned the doc to match. Run context rides the run/session (runContext on /run, refreshed per turn); tool definitions declare a bind map and the runner fills bound fields server-side at dispatch, hidden from the model. Self-targeting tools bind their own variant/trace this way. No MCP resource, no get_context tool, no context file, no prompt injection — those are now in "Considered but not used" so the implementation agent isn't confused. This closes the last open decision; the design is complete.
|
🤖 The AI agent says: Decision recorded + cleanup round done ( Run-context delivery is settled:
|
mmabrouk
left a comment
There was a problem hiding this comment.
🤖 The AI agent says: Added the interface reference + the endpoint-wrapper set + the catalog impl (Codex). Inline pointer + summary below.
| @@ -0,0 +1,97 @@ | |||
| # Interface reference — direct-call tools | |||
There was a problem hiding this comment.
🤖 The AI agent says: New interface reference — the concrete contract delta (the call descriptor, runContext, the type:"platform" config, the platform-op catalog entry, the reference config, and dispatch body-assembly), each with its file:line seam, so review and implementation point at the same spec. This is the references-that-support-the-review you asked for.
|
🤖 The AI agent says: Addressed your four asks:
Two Codex endpoint-set fixes folded: "invoke" is the reference-tool path (not a duplicate); "inspect harnesses" is the services Committed |
Design docs for making resolved agent tools carry their own call target, so the sidecar calls reference/platform tools directly and only gateway (Composio) tools route through /tools/call. Design + plan only; implementation dispatched separately.
Address Mahmoud's round-2 comments + CodeRabbit: - reference input_schema from the workflow's revision.schemas.inputs (messages via x-ag-type-ref); output schemas deferred - permissions uniform across all tool types (not special-cased on the call path) - platform-op catalog mirrors the reserved tools.agenta.* pattern (#4884 find_capabilities) + evaluators catalog, not platform_catalog.py wholesale - schema sourcing standardized on the in-process CATALOG_TYPES catalog (fixes the openapi.json inconsistency) - merge order/organization framed as the orchestrator's call
New run-context.md: how the run's context (trace_id, running workflow/variant + draft-state + latest revision, session_id) reaches the auto-editing tools. Four options (static get_run_context tool / runner injection / hybrid / traceparent), system boundaries, the canonical-primitive framing, tradeoffs. Codex-reviewed: revised recommendation = Option C (run-level runContext channel + server-side binding of protected fields), with security findings (confused-deputy on variant, revision-precondition, trace integrity, annotations-mints-new-trace gap) and call- descriptor SSRF guardrails. design.md: added the SSRF guardrail + a run-context pointer.
Platform tools are a THIN WRAPPER over EXISTING endpoints (no new endpoints, no logic-wrapping tools). Removed the fake update_own_workflow / add_trace_annotation tools; expose raw endpoints, compose multi-step via skills. run-context.md rewritten + Codex-reviewed (rev 2): run context is run-level data not a tool; protected self-identity (own variant/trace) is server-bound into call.body and hidden from the model (within-project lateral-movement risk); model-readable context via MCP resource + file fallback; refresh per turn. design.md/plan.md reconciled.
…ewed) How a platform-op catalog entry declares a context bind (endpoint field <- run-context key): the service strips bound fields from the model schema + emits call.context; the runner fills them at dispatch from a per-turn runContext blob on /run. Codex binding review folded in: hybrid resolution (immutable identity baked at resolve time, mutable filled at dispatch), path-aware schema stripping, merge hardening (path-conflict rejection, strict path parsing, prototype-pollution-safe deep-set, post-merge validation), runContext as its own /run field. Added the implementation checklist.
Decision (Mahmoud): run context rides the run/session (runContext on /run, refreshed per turn); tool definitions declare a bind map and the runner fills bound fields server-side at dispatch, hidden from the model. Self-targeting tools bind their own variant/trace this way. NO MCP resource, NO get_context tool, NO context file, NO prompt injection. Cleaned run-context.md to a decision doc (+ 'Considered but not used'); reconciled design.md (Run context + Context binding) and plan.md; status.md marks the design complete (last open decision closed).
…l (Codex) interfaces.md: the concrete interface delta (call descriptor, runContext, type:platform config, platform-op catalog entry, reference config, dispatch body-assembly) with the file:line seams, so review + implementation share one reference. plan.md: the endpoint-wrapper set to expose (from #4863, reframed as existing endpoints; plain + self/bind variants incl. the annotate pair) + catalog implementation per the Codex catalog review (typed catalog in SDK platform/op_catalog.py, reuse PlatformConnection, stable tools.agenta.* namespace, per-op permission defaults; execution stays direct, find_capabilities migrates to direct).
73a2b13 to
d0df7f6
Compare
What
Design docs (no code) for direct-call tools: make each resolved agent tool carry its own
call target so the sidecar calls reference and platform tools directly, and only
gateway (Composio) tools route through the central
/tools/call.Lives under
docs/design/agent-workflows/projects/direct-call-tools/. This is the Workstream Adesign, up for review. Implementation is dispatched separately (the workstream split is in
plan.md).Why
Today every callback tool POSTs to one endpoint,
/tools/call, which re-parses a string andre-dispatches. For a reference (a stored workflow used as a tool) or a platform tool (an
Agenta operation like create-workflow or annotate), the target is just an Agenta endpoint and the
sidecar already holds the caller credential, so the hop adds a step and no value. Gateway is
the one real exception: only the server can read the Composio secret. Seeded by the line-67
comment on #4863.
What's in it
context.md— the decision, goals, non-goals.research.md— the actual code seams (config/resolved/dispatch, the reference-invoke path,the platform-op catalog precedent, the in-process schema catalog) with file:line.
design.md— thecalldescriptor, the per-tool-type table (config -> resolved -> dispatch),the platform-op catalog, the reference resolution.
plan.md— the two-workstream split (this is A; B drops the@ag.referencemarker andrebuilds the FE on
type:"reference"on [feat] Reference a workflow as an agent tool (type:"reference") #4860 / feat(frontend): reference a workflow as an agent tool (type:"reference") #4877) plus the A execution phases.status.md,README.md.Decisions settled in the design
@ag.referencemarker (keeptype:"reference"); keep embed, hidden in the UI fornow. [Workstream B]
always invoked fresh by the service, so it stays current).
Decision needed from you
See the 🔸 Decision needed comment: when to remove the
/tools/callworkflow.*routing(my recommendation: in Workstream A, after B lands).
How to review
This is a design review, not a code review. Read
design.mdfirst (the shape), thenplan.md(the split + phases). No code in this PR.Related: #4860 / #4877 (the reference tool, Workstream B), #4837 (embedref design), #4863
(custom-tools note, where this started).
https://claude.ai/code/session_01YXbQbNQvPn6GU3P8TvLMEw