[feat] Direct-call tools — Phase 3b: platform-op catalog + find_capabilities + self-update#4893
Conversation
…capabilities + self-update
Add the type:"platform" tool config and the platform-op catalog so an agent can call
existing Agenta endpoints directly via the `call` descriptor (Phases 1-3a), no /tools/call hop.
- New typed catalog platform/op_catalog.py mapping op -> {description, method, path,
input_schema/_ref, bind, default permission/approval}, mirroring the evaluators catalog +
the reserved tools.agenta.* pattern.
- AgentaPlatformToolResolver resolves each op to a CallbackToolSpec carrying a direct `call`
(reusing PlatformConnection); strips bind fields from the model-visible schema and emits
them as call.context.
- First ops: find_capabilities (POST /api/tools/discover — now agent-usable end to end),
query_workflows (read), commit_revision (self-update; binds the running variant id from
$ctx.workflow.variant.id, defaults to approval).
- PlatformToolConfig arm on the ToolConfig union surfaces in the agent-config schema.
- Vendors #4890's expand_type_refs as platform/_schema.py for x-ag-type-ref input schemas.
- Docs synced: documentation/tools.md + the interface inventory.
Runner unchanged: it dispatches `call` opaquely (the reference/platform branch already exists).
Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🤖 The AI agent says: Changes made (Phase 3b)This adds the producer side of direct-call tools:
Tests: 440 SDK agent tests pass; What I need from youA code review of three things, plus a green-light to run the end-to-end:
Kept DRAFT (mid-feature, pre-e2e). CodeRabbit not triggered yet (it skips drafts). The |
| _CTX_TOKEN_PREFIX = "$ctx." | ||
|
|
||
|
|
||
| class PlatformOp(BaseModel): |
There was a problem hiding this comment.
🤖 The AI agent says: The catalog model. PlatformOp is the typed catalog entry (Codex-review item §4): op -> {description, method, path, input_schema|input_schema_ref, bind, default permission/approval}. It validates at import — exactly one schema source, a single absolute /api/... path, and $ctx. bind tokens — so a bad entry fails on load, not at run time. Review target #1: is this the right contract, and is the typed-model-vs-loose-dict call right?
| op.op: op | ||
| for op in ( | ||
| PlatformOp( | ||
| op="find_capabilities", |
There was a problem hiding this comment.
🤖 The AI agent says: find_capabilities emission (THE deferred item). This catalog entry is what makes find_capabilities agent-usable: it resolves to a direct call to POST /api/tools/discover (migrating off the server-side /tools/call tools.agenta.* dispatch, which is left in place for now). Its description + input schema are duplicated from api/.../discovery.py because the SDK must not import the API — a drift risk to watch.
There was a problem hiding this comment.
the drift risk is problematic. we need to solve this at some point. let's add an issue for this in github
| method="POST", | ||
| path="/api/workflows/revisions/commit", | ||
| input_schema=_COMMIT_REVISION_INPUT_SCHEMA, | ||
| bind={"workflow_revision.workflow_variant_id": "$ctx.workflow.variant.id"}, |
There was a problem hiding this comment.
🤖 The AI agent says: The self-update bind. commit_revision binds workflow_revision.workflow_variant_id ← $ctx.workflow.variant.id (#4892's run-context token). resolved_input_schema() strips this field from the model-visible schema; to_call() emits it as call.context, so the runner fills it from runContext at dispatch and the model can never target a different variant. Paired with default_needs_approval=True below → gated + self-only. Review target #2.
| name=op.op, | ||
| description=op.description, | ||
| input_schema=op.resolved_input_schema(), | ||
| call=op.to_call(), |
There was a problem hiding this comment.
🤖 The AI agent says: The resolver. Each op becomes a CallbackToolSpec carrying a direct call (op.to_call()) with the bound fields stripped from op.resolved_input_schema(). It reuses PlatformConnection for the shared ToolCallback (the origin the runner resolves the relative path against) — no new HTTP plumbing. The config's needs_approval/permission override the catalog default; unset falls back to the per-op default (above).
| description: str = Field( | ||
| min_length=1, description="Model-facing description (SDK-owned)." | ||
| ) | ||
| method: Literal["GET", "POST"] |
There was a problem hiding this comment.
what about put delete and other methods
|
|
||
| model_config = ConfigDict(extra="forbid", frozen=True) | ||
|
|
||
| op: str = Field( |
There was a problem hiding this comment.
let's not use abbreviations whenever is possible unless widely used (add it as a memory)
| # Self-targeting fields the runner fills server-side from run context: a dotted body path on the | ||
| # endpoint's request -> a ``$ctx.<key>`` token. These are stripped from the model-visible schema | ||
| # and emitted as ``call.context`` so the model supplies only the payload and can never retarget. | ||
| bind: Dict[str, str] = Field(default_factory=dict) |
There was a problem hiding this comment.
the variable naming here is not good, the names should be clear
|
🤖 The AI agent says: Triaged your four review comments. One is a contained fix (PR #4906); three are
Note: CodeRabbit never ran on #4893 (it skips non-default base branches), so there were no automated findings to address. I triggered it on the follow-up #4906. PR #4906 is do-not-merge / needs-review. |
[refactor] PR #4893 follow-up: rename platform-op `bind` to `context_bindings`
Context
Phases 1–3a gave a resolved tool a
calldescriptor: the runner can call an Agenta endpointdirectly instead of routing through
/tools/call. Nothing emitted acallyet, and the reservedfind_capabilitiesdiscovery tool (PR #4884) was server-wired but not agent-usable — no agentconfig could declare it, which blocked the discover-and-wire-tools skill end to end.
This phase adds the producer side: the
type:"platform"tool config and a typed platform-opcatalog that maps an op to an existing Agenta endpoint and emits the
call.find_capabilitiesbecomes the first platform tool, so an agent can finally discover tools end to end. It also adds a
self-update op so an agent can commit a revision to its own variant.
Builds on #4892 (run-context delivery +
bind); uses its$ctx.workflow.variant.id/trace.*run-context tokens.
Scope / risk
In scope (SDK + docs + tests only — no runner change, confirmed: the runner dispatches
callopaquely, and the reference/platform branch already exists in
relay.ts):platform/op_catalog.py: a typedPlatformOpmodel (validated at import) +the
optable;AgentaPlatformToolResolverbridges it to aCallbackToolSpecwith a directcall, reusingPlatformConnection.PlatformToolConfigjoins theToolConfigunion, so itsurfaces in the agent-config schema automatically.
find_capabilities(read),query_workflows(read),commit_revision(self-update). The set is intentionally small; more ops are a data add to the catalog.
find_capabilitiesnow agent-usable — emits a directcalltoPOST /api/tools/discover.commit_revisionis gated — it binds the running variant id from run context and defaultsto
needs_approval: true, so an agent can only update itself and only with approval.Risk is contained:
/tools/calltools.agenta.*dispatch is left in place (removed in alater phase), so nothing that relied on it breaks.
find_capabilities' description + input schema are duplicated SDK-side (the SDK must not importthe API). Documented; a drift risk to watch.
expand_type_refsasplatform/_schema.py(identical content) since [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890 isnot in this stack. If [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890 lands first, the identical add/add merges cleanly.
Not touched: the runner/TS, the gateway path, reference-tool execution,
/tools/call.How to QA
Prerequisites:
cd sdks/python && uv sync --locked.agents/platform/test_op_catalog.py(catalogvalidation, find_capabilities direct-call emission, the commit_revision bind/strip, defaults +
override, error paths) plus platform arms added to
test_models.py,test_parsing.py,test_resolver.py,test_skill_config_catalog.py.callwire):ruff format --checkandruff checkoveragenta/sdk/agents— clean.End-to-end (orchestrator, after the stack deploys): declare
{type:"platform", op:"find_capabilities"}on an agent config and confirm the model calls discover and gets toolsback; declare
commit_revisionand confirm the approval gate fires and the commit targets theagent's own variant.
Edge cases covered by tests: an unknown
opraisesUnknownPlatformOpError; a missing API baseraises a typed error; a duplicate platform tool is rejected; an
input_schema_refmust be a knownCATALOG_TYPESkey.Interface reference
Config arm (
ToolConfigunion,tools/models.py):{ "type": "platform", "op": "find_capabilities", "needs_approval": null, "permission": null } // needs_approval is optional: null = use the catalog's per-op default.Catalog entry (
platform/op_catalog.py,PlatformOp):Resolved spec — a
CallbackToolSpecwith a directcall(nocall_ref):The 3 ops:
find_capabilitiesPOST /api/tools/discoverquery_workflowsPOST /api/workflows/querycommit_revisionPOST /api/workflows/revisions/commitSeams:
platform/op_catalog.py(new),platform/platform_tools.py(new),platform/_schema.py(new),
tools/models.py(PlatformToolConfig),tools/resolver.py+platform/resolve.py(wiring), docs in
documentation/tools.md+ the interface inventory.🤖 The AI agent says: This is mid-feature (pre-e2e) — kept as a DRAFT. CodeRabbit not yet triggered.
https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc