[feat] Direct-call tools — Phase 3b: platform-op catalog + find_capabilities + self-update by mmabrouk · Pull Request #4893 · Agenta-AI/agenta

mmabrouk · 2026-06-28T00:09:50Z

Context

Phases 1–3a gave a resolved tool a call descriptor: the runner can call an Agenta endpoint
directly instead of routing through /tools/call. Nothing emitted a call yet, and the reserved
find_capabilities discovery tool (PR #4884) was server-wired but not agent-usable — no agent
config could declare it, which blocked the discover-and-wire-tools skill end to end.

This phase adds the producer side: the type:"platform" tool config and a typed platform-op
catalog that maps an op to an existing Agenta endpoint and emits the call. find_capabilities
becomes the first platform tool, so an agent can finally discover tools end to end. It also adds a
self-update op so an agent can commit a revision to its own variant.

Builds on #4892 (run-context delivery + bind); uses its $ctx.workflow.variant.id / trace.*
run-context tokens.

Scope / risk

In scope (SDK + docs + tests only — no runner change, confirmed: the runner dispatches call
opaquely, and the reference/platform branch already exists in relay.ts):

Catalog infra — platform/op_catalog.py: a typed PlatformOp model (validated at import) +
the op table; AgentaPlatformToolResolver bridges it to a CallbackToolSpec with a direct
call, reusing PlatformConnection. PlatformToolConfig joins the ToolConfig union, so it
surfaces in the agent-config schema automatically.
3 first ops — find_capabilities (read), query_workflows (read), commit_revision
(self-update). The set is intentionally small; more ops are a data add to the catalog.
find_capabilities now agent-usable — emits a direct call to POST /api/tools/discover.
commit_revision is gated — it binds the running variant id from run context and defaults
to needs_approval: true, so an agent can only update itself and only with approval.

Risk is contained:

The legacy server-side /tools/call tools.agenta.* dispatch is left in place (removed in a
later phase), so nothing that relied on it breaks.
find_capabilities' description + input schema are duplicated SDK-side (the SDK must not import
the API). Documented; a drift risk to watch.
Vendors [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890's expand_type_refs as platform/_schema.py (identical content) since [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890 is
not in this stack. If [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890 lands first, the identical add/add merges cleanly.

Not touched: the runner/TS, the gateway path, reference-tool execution, /tools/call.

How to QA

Prerequisites: cd sdks/python && uv sync --locked.

Run the SDK agent tests:
```
uv run --no-sync python -m pytest oss/tests/pytest/unit/agents oss/tests/pytest/unit/test_skill_config_catalog.py -p no:cacheprovider -q
```
Expected: all pass (440 here). New coverage: agents/platform/test_op_catalog.py (catalog
validation, find_capabilities direct-call emission, the commit_revision bind/strip, defaults +
override, error paths) plus platform arms added to test_models.py, test_parsing.py,
test_resolver.py, test_skill_config_catalog.py.

Wire contract unchanged (platform tools reuse the existing call wire):

uv run --no-sync python -m pytest oss/tests/pytest/unit/agents/test_wire_contract.py -p no:cacheprovider -q

Lint: ruff format --check and ruff check over agenta/sdk/agents — clean.

End-to-end (orchestrator, after the stack deploys): declare {type:"platform", op:"find_capabilities"} on an agent config and confirm the model calls discover and gets tools
back; declare commit_revision and confirm the approval gate fires and the commit targets the
agent's own variant.

Edge cases covered by tests: an unknown op raises UnknownPlatformOpError; a missing API base
raises a typed error; a duplicate platform tool is rejected; an input_schema_ref must be a known
CATALOG_TYPES key.

Interface reference

Config arm (ToolConfig union, tools/models.py):

{ "type": "platform", "op": "find_capabilities", "needs_approval": null, "permission": null }
// needs_approval is optional: null = use the catalog's per-op default.

Catalog entry (platform/op_catalog.py, PlatformOp):

op -> { description, method, path, input_schema | input_schema_ref,
        bind: { "<endpoint.body.path>": "$ctx.<key>" },
        default_permission, default_needs_approval }

Resolved spec — a CallbackToolSpec with a direct call (no call_ref):

// find_capabilities
{ "kind": "callback", "name": "find_capabilities",
  "call": { "method": "POST", "path": "/api/tools/discover" } }

// commit_revision (self-update): the bound field is stripped from the model schema and
// rides call.context; the runner fills it from runContext at dispatch.
{ "kind": "callback", "name": "commit_revision", "needsApproval": true, "permission": "ask",
  "call": { "method": "POST", "path": "/api/workflows/revisions/commit",
            "context": { "workflow_revision.workflow_variant_id": "$ctx.workflow.variant.id" } } }

The 3 ops:

op	endpoint	gate
`find_capabilities`	`POST /api/tools/discover`	read (auto-allow)
`query_workflows`	`POST /api/workflows/query`	read (auto-allow)
`commit_revision`	`POST /api/workflows/revisions/commit`	mutating (approval) + self-bind

Seams: platform/op_catalog.py (new), platform/platform_tools.py (new), platform/_schema.py
(new), tools/models.py (PlatformToolConfig), tools/resolver.py + platform/resolve.py
(wiring), docs in documentation/tools.md + the interface inventory.

🤖 The AI agent says: This is mid-feature (pre-e2e) — kept as a DRAFT. CodeRabbit not yet triggered.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…capabilities + self-update Add the type:"platform" tool config and the platform-op catalog so an agent can call existing Agenta endpoints directly via the `call` descriptor (Phases 1-3a), no /tools/call hop. - New typed catalog platform/op_catalog.py mapping op -> {description, method, path, input_schema/_ref, bind, default permission/approval}, mirroring the evaluators catalog + the reserved tools.agenta.* pattern. - AgentaPlatformToolResolver resolves each op to a CallbackToolSpec carrying a direct `call` (reusing PlatformConnection); strips bind fields from the model-visible schema and emits them as call.context. - First ops: find_capabilities (POST /api/tools/discover — now agent-usable end to end), query_workflows (read), commit_revision (self-update; binds the running variant id from $ctx.workflow.variant.id, defaults to approval). - PlatformToolConfig arm on the ToolConfig union surfaces in the agent-config schema. - Vendors #4890's expand_type_refs as platform/_schema.py for x-ag-type-ref input schemas. - Docs synced: documentation/tools.md + the interface inventory. Runner unchanged: it dispatches `call` opaquely (the reference/platform branch already exists). Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

vercel · 2026-06-28T00:09:56Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 28, 2026 12:10am

coderabbitai · 2026-06-28T00:09:58Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b81a8763-1211-4f7a-a2c2-6f66ad73b434

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/direct-call-tools-platform-ops

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mmabrouk · 2026-06-28T00:10:41Z

🤖 The AI agent says:

Changes made (Phase 3b)

This adds the producer side of direct-call tools: type:"platform" tools and a typed platform-op
catalog. It builds on #4892 (run-context + bind).

Catalog model — platform/op_catalog.py: a typed PlatformOp (validated at import — exactly
one schema source, a relative /api/... path, well-formed $ctx bind tokens) plus the op
table and get_platform_op. Descriptions live in the SDK; input schemas resolve via
expand_type_refs/CATALOG_TYPES. Mirrors the evaluators catalog + the reserved tools.agenta.*
pattern.
Resolver — platform/platform_tools.py (AgentaPlatformToolResolver): each op becomes a
CallbackToolSpec with a direct call (reusing PlatformConnection for the origin + auth). It
strips the op's bind fields from the model-visible schema and emits them as call.context.
Wired into tools/resolver.py + platform/resolve.py as a new injected resolver.
Config arm — PlatformToolConfig joins the ToolConfig union, so it surfaces in the
agent-config schema automatically (no separate schema edit).
3 first ops — find_capabilities (now agent-usable: a direct call to
POST /api/tools/discover), query_workflows (read), commit_revision (self-update: binds the
running variant id, defaults to approval).
Docs synced — documentation/tools.md + the interface inventory (tool-models, agent-config,
runner-to-tool-callback, the index).
Runner: no change (confirmed) — it dispatches call opaquely; the reference/platform branch
already exists in relay.ts.

Tests: 440 SDK agent tests pass; ruff format --check and ruff check clean.

What I need from you

A code review of three things, plus a green-light to run the end-to-end:

The catalog shape and the first op set — is PlatformOp the right typed contract, and are
find_capabilities / query_workflows / commit_revision the right first three? (More ops are
a data add.)
The self-update default — commit_revision defaults to needs_approval: true and binds the
agent's own variant. Is "gated + self-only by default" the policy you want?
The expand_type_refs overlap — I vendored [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890's helper as platform/_schema.py (identical
content) because [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890 is on a different base. Confirm you're OK with the temporary duplicate
(clean add/add merge), or tell me to stack on [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890 instead.

Kept DRAFT (mid-feature, pre-e2e). CodeRabbit not triggered yet (it skips drafts). The
orchestrator runs the live find_capabilities + commit_revision e2e next.

mmabrouk · 2026-06-28T00:11:00Z

+_CTX_TOKEN_PREFIX = "$ctx."
+
+
+class PlatformOp(BaseModel):


🤖 The AI agent says: The catalog model. PlatformOp is the typed catalog entry (Codex-review item §4): op -> {description, method, path, input_schema|input_schema_ref, bind, default permission/approval}. It validates at import — exactly one schema source, a single absolute /api/... path, and $ctx. bind tokens — so a bad entry fails on load, not at run time. Review target #1: is this the right contract, and is the typed-model-vs-loose-dict call right?

mmabrouk · 2026-06-28T00:11:01Z

+    op.op: op
+    for op in (
+        PlatformOp(
+            op="find_capabilities",


🤖 The AI agent says: find_capabilities emission (THE deferred item). This catalog entry is what makes find_capabilities agent-usable: it resolves to a direct call to POST /api/tools/discover (migrating off the server-side /tools/call tools.agenta.* dispatch, which is left in place for now). Its description + input schema are duplicated from api/.../discovery.py because the SDK must not import the API — a drift risk to watch.

the drift risk is problematic. we need to solve this at some point. let's add an issue for this in github

mmabrouk · 2026-06-28T00:11:01Z

+            method="POST",
+            path="/api/workflows/revisions/commit",
+            input_schema=_COMMIT_REVISION_INPUT_SCHEMA,
+            bind={"workflow_revision.workflow_variant_id": "$ctx.workflow.variant.id"},


🤖 The AI agent says: The self-update bind. commit_revision binds workflow_revision.workflow_variant_id ← $ctx.workflow.variant.id (#4892's run-context token). resolved_input_schema() strips this field from the model-visible schema; to_call() emits it as call.context, so the runner fills it from runContext at dispatch and the model can never target a different variant. Paired with default_needs_approval=True below → gated + self-only. Review target #2.

mmabrouk · 2026-06-28T00:11:01Z

+                    name=op.op,
+                    description=op.description,
+                    input_schema=op.resolved_input_schema(),
+                    call=op.to_call(),


🤖 The AI agent says: The resolver. Each op becomes a CallbackToolSpec carrying a direct call (op.to_call()) with the bound fields stripped from op.resolved_input_schema(). It reuses PlatformConnection for the shared ToolCallback (the origin the runner resolves the relative path against) — no new HTTP plumbing. The config's needs_approval/permission override the catalog default; unset falls back to the per-op default (above).

mmabrouk · 2026-06-28T11:54:16Z

+    description: str = Field(
+        min_length=1, description="Model-facing description (SDK-owned)."
+    )
+    method: Literal["GET", "POST"]


what about put delete and other methods

mmabrouk · 2026-06-28T11:55:03Z

+
+    model_config = ConfigDict(extra="forbid", frozen=True)
+
+    op: str = Field(


let's not use abbreviations whenever is possible unless widely used (add it as a memory)

mmabrouk · 2026-06-28T11:56:22Z

+    # Self-targeting fields the runner fills server-side from run context: a dotted body path on the
+    # endpoint's request -> a ``$ctx.<key>`` token. These are stripped from the model-visible schema
+    # and emitted as ``call.context`` so the model supplies only the payload and can never retarget.
+    bind: Dict[str, str] = Field(default_factory=dict)


the variable naming here is not good, the names should be clear

mmabrouk · 2026-06-28T12:43:39Z

🤖 The AI agent says:

Triaged your four review comments. One is a contained fix (PR #4906); three are
interface/design changes filed as issues.

Your comment	Outcome
`bind` / "the variable naming here is not good, the names should be clear" (on the catalog fields)	Fixed in #4906 — renamed `PlatformOp.bind` -> `context_bindings` (SDK-internal only; the wire field stays `context`).
`op` / "let's not use abbreviations whenever is possible unless widely used (add it as a memory)" (here)	Added to agent memory (prefer full words unless widely used). The `op` rename itself is public-edge (config arm + ~60 refs), so it is tracked with `args_into` in #4909 for one deliberate `design-interfaces` pass — and #4909 asks you to confirm whether `op` even clears the "widely used" bar before we touch it.
`method` / "what about put delete and other methods" (here)	Filed as #4907. GET/POST-only is a deliberate, documented constraint at the wire level too ("constrained dispatcher, never an arbitrary HTTP client"), so widening is a design + security decision, not a quick widen. The issue lays out the decision points.
"the drift risk is problematic … let's add an issue for this in github" (here)	Filed as #4908 — `find_capabilities` description + schema duplicated SDK-side vs `api/.../discovery.py`, with proposed directions (shared catalog type / build-time gen / fails-on-drift test).

Note: CodeRabbit never ran on #4893 (it skips non-default base branches), so there were no automated findings to address. I triggered it on the follow-up #4906.

PR #4906 is do-not-merge / needs-review.

[refactor] PR #4893 follow-up: rename platform-op `bind` to `context_bindings`

mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 28, 2026

vercel Bot deployed to Preview June 28, 2026 00:10 View deployment

mmabrouk commented Jun 28, 2026

View reviewed changes

mmabrouk mentioned this pull request Jun 28, 2026

docs(agent): agent-creation skills + custom-tools design note #4863

Draft

mmabrouk marked this pull request as ready for review June 28, 2026 02:53

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request python Pull requests that update Python code tests labels Jun 28, 2026

mmabrouk changed the base branch from feat/direct-call-tools-runcontext to big-agents June 28, 2026 02:56

mmabrouk merged commit 7521444 into big-agents Jun 28, 2026
16 checks passed

mmabrouk mentioned this pull request Jun 28, 2026

[integration] big-agents #4791

Open

12 tasks

mmabrouk commented Jun 28, 2026

View reviewed changes

mmabrouk added a commit that referenced this pull request Jun 28, 2026

Merge pull request #4906 from Agenta-AI/fix/4893-followups

1e55134

[refactor] PR #4893 follow-up: rename platform-op `bind` to `context_bindings`


		model_config = ConfigDict(extra="forbid", frozen=True)

		op: str = Field(

		_CTX_TOKEN_PREFIX = "$ctx."


		class PlatformOp(BaseModel):

Uh oh!

Conversation

mmabrouk commented Jun 28, 2026

Context

Scope / risk

How to QA

Interface reference

Uh oh!

vercel Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

mmabrouk commented Jun 28, 2026

Changes made (Phase 3b)

What I need from you

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmabrouk commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 28, 2026 •

edited

Loading

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading