Skip to content

[feat] Direct-call tools — Phase 3b: platform-op catalog + find_capabilities + self-update#4893

Merged
mmabrouk merged 1 commit into
big-agentsfrom
feat/direct-call-tools-platform-ops
Jun 28, 2026
Merged

[feat] Direct-call tools — Phase 3b: platform-op catalog + find_capabilities + self-update#4893
mmabrouk merged 1 commit into
big-agentsfrom
feat/direct-call-tools-platform-ops

Conversation

@mmabrouk

Copy link
Copy Markdown
Member

Context

Phases 1–3a gave a resolved tool a call descriptor: the runner can call an Agenta endpoint
directly instead of routing through /tools/call. Nothing emitted a call yet, and the reserved
find_capabilities discovery tool (PR #4884) was server-wired but not agent-usable — no agent
config could declare it, which blocked the discover-and-wire-tools skill end to end.

This phase adds the producer side: the type:"platform" tool config and a typed platform-op
catalog
that maps an op to an existing Agenta endpoint and emits the call. find_capabilities
becomes the first platform tool, so an agent can finally discover tools end to end. It also adds a
self-update op so an agent can commit a revision to its own variant.

Builds on #4892 (run-context delivery + bind); uses its $ctx.workflow.variant.id / trace.*
run-context tokens.

Scope / risk

In scope (SDK + docs + tests only — no runner change, confirmed: the runner dispatches call
opaquely, and the reference/platform branch already exists in relay.ts):

  • Catalog infraplatform/op_catalog.py: a typed PlatformOp model (validated at import) +
    the op table; AgentaPlatformToolResolver bridges it to a CallbackToolSpec with a direct
    call, reusing PlatformConnection. PlatformToolConfig joins the ToolConfig union, so it
    surfaces in the agent-config schema automatically.
  • 3 first opsfind_capabilities (read), query_workflows (read), commit_revision
    (self-update). The set is intentionally small; more ops are a data add to the catalog.
  • find_capabilities now agent-usable — emits a direct call to POST /api/tools/discover.
  • commit_revision is gated — it binds the running variant id from run context and defaults
    to needs_approval: true, so an agent can only update itself and only with approval.

Risk is contained:

Not touched: the runner/TS, the gateway path, reference-tool execution, /tools/call.

How to QA

Prerequisites: cd sdks/python && uv sync --locked.

  1. Run the SDK agent tests:
    uv run --no-sync python -m pytest oss/tests/pytest/unit/agents oss/tests/pytest/unit/test_skill_config_catalog.py -p no:cacheprovider -q
    
    Expected: all pass (440 here). New coverage: agents/platform/test_op_catalog.py (catalog
    validation, find_capabilities direct-call emission, the commit_revision bind/strip, defaults +
    override, error paths) plus platform arms added to test_models.py, test_parsing.py,
    test_resolver.py, test_skill_config_catalog.py.
  2. Wire contract unchanged (platform tools reuse the existing call wire):
    uv run --no-sync python -m pytest oss/tests/pytest/unit/agents/test_wire_contract.py -p no:cacheprovider -q
    
  3. Lint: ruff format --check and ruff check over agenta/sdk/agents — clean.

End-to-end (orchestrator, after the stack deploys): declare {type:"platform", op:"find_capabilities"} on an agent config and confirm the model calls discover and gets tools
back; declare commit_revision and confirm the approval gate fires and the commit targets the
agent's own variant.

Edge cases covered by tests: an unknown op raises UnknownPlatformOpError; a missing API base
raises a typed error; a duplicate platform tool is rejected; an input_schema_ref must be a known
CATALOG_TYPES key.

Interface reference

Config arm (ToolConfig union, tools/models.py):

{ "type": "platform", "op": "find_capabilities", "needs_approval": null, "permission": null }
// needs_approval is optional: null = use the catalog's per-op default.

Catalog entry (platform/op_catalog.py, PlatformOp):

op -> { description, method, path, input_schema | input_schema_ref,
        bind: { "<endpoint.body.path>": "$ctx.<key>" },
        default_permission, default_needs_approval }

Resolved spec — a CallbackToolSpec with a direct call (no call_ref):

// find_capabilities
{ "kind": "callback", "name": "find_capabilities",
  "call": { "method": "POST", "path": "/api/tools/discover" } }

// commit_revision (self-update): the bound field is stripped from the model schema and
// rides call.context; the runner fills it from runContext at dispatch.
{ "kind": "callback", "name": "commit_revision", "needsApproval": true, "permission": "ask",
  "call": { "method": "POST", "path": "/api/workflows/revisions/commit",
            "context": { "workflow_revision.workflow_variant_id": "$ctx.workflow.variant.id" } } }

The 3 ops:

op endpoint gate
find_capabilities POST /api/tools/discover read (auto-allow)
query_workflows POST /api/workflows/query read (auto-allow)
commit_revision POST /api/workflows/revisions/commit mutating (approval) + self-bind

Seams: platform/op_catalog.py (new), platform/platform_tools.py (new), platform/_schema.py
(new), tools/models.py (PlatformToolConfig), tools/resolver.py + platform/resolve.py
(wiring), docs in documentation/tools.md + the interface inventory.

🤖 The AI agent says: This is mid-feature (pre-e2e) — kept as a DRAFT. CodeRabbit not yet triggered.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…capabilities + self-update

Add the type:"platform" tool config and the platform-op catalog so an agent can call
existing Agenta endpoints directly via the `call` descriptor (Phases 1-3a), no /tools/call hop.

- New typed catalog platform/op_catalog.py mapping op -> {description, method, path,
  input_schema/_ref, bind, default permission/approval}, mirroring the evaluators catalog +
  the reserved tools.agenta.* pattern.
- AgentaPlatformToolResolver resolves each op to a CallbackToolSpec carrying a direct `call`
  (reusing PlatformConnection); strips bind fields from the model-visible schema and emits
  them as call.context.
- First ops: find_capabilities (POST /api/tools/discover — now agent-usable end to end),
  query_workflows (read), commit_revision (self-update; binds the running variant id from
  $ctx.workflow.variant.id, defaults to approval).
- PlatformToolConfig arm on the ToolConfig union surfaces in the agent-config schema.
- Vendors #4890's expand_type_refs as platform/_schema.py for x-ag-type-ref input schemas.
- Docs synced: documentation/tools.md + the interface inventory.

Runner unchanged: it dispatches `call` opaquely (the reference/platform branch already exists).

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
@vercel

vercel Bot commented Jun 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 28, 2026 12:10am

Request Review

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b81a8763-1211-4f7a-a2c2-6f66ad73b434

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/direct-call-tools-platform-ops

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mmabrouk mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 28, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

🤖 The AI agent says:

Changes made (Phase 3b)

This adds the producer side of direct-call tools: type:"platform" tools and a typed platform-op
catalog. It builds on #4892 (run-context + bind).

  • Catalog modelplatform/op_catalog.py: a typed PlatformOp (validated at import — exactly
    one schema source, a relative /api/... path, well-formed $ctx bind tokens) plus the op
    table and get_platform_op. Descriptions live in the SDK; input schemas resolve via
    expand_type_refs/CATALOG_TYPES. Mirrors the evaluators catalog + the reserved tools.agenta.*
    pattern.
  • Resolverplatform/platform_tools.py (AgentaPlatformToolResolver): each op becomes a
    CallbackToolSpec with a direct call (reusing PlatformConnection for the origin + auth). It
    strips the op's bind fields from the model-visible schema and emits them as call.context.
    Wired into tools/resolver.py + platform/resolve.py as a new injected resolver.
  • Config armPlatformToolConfig joins the ToolConfig union, so it surfaces in the
    agent-config schema automatically (no separate schema edit).
  • 3 first opsfind_capabilities (now agent-usable: a direct call to
    POST /api/tools/discover), query_workflows (read), commit_revision (self-update: binds the
    running variant id, defaults to approval).
  • Docs synceddocumentation/tools.md + the interface inventory (tool-models, agent-config,
    runner-to-tool-callback, the index).
  • Runner: no change (confirmed) — it dispatches call opaquely; the reference/platform branch
    already exists in relay.ts.

Tests: 440 SDK agent tests pass; ruff format --check and ruff check clean.

What I need from you

A code review of three things, plus a green-light to run the end-to-end:

  1. The catalog shape and the first op set — is PlatformOp the right typed contract, and are
    find_capabilities / query_workflows / commit_revision the right first three? (More ops are
    a data add.)
  2. The self-update default — commit_revision defaults to needs_approval: true and binds the
    agent's own variant. Is "gated + self-only by default" the policy you want?
  3. The expand_type_refs overlap — I vendored [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890's helper as platform/_schema.py (identical
    content) because [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890 is on a different base. Confirm you're OK with the temporary duplicate
    (clean add/add merge), or tell me to stack on [fix] Expand x-ag-type-ref in reference-tool input schemas so harnesses can invoke them #4890 instead.

Kept DRAFT (mid-feature, pre-e2e). CodeRabbit not triggered yet (it skips drafts). The
orchestrator runs the live find_capabilities + commit_revision e2e next.

_CTX_TOKEN_PREFIX = "$ctx."


class PlatformOp(BaseModel):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The AI agent says: The catalog model. PlatformOp is the typed catalog entry (Codex-review item §4): op -> {description, method, path, input_schema|input_schema_ref, bind, default permission/approval}. It validates at import — exactly one schema source, a single absolute /api/... path, and $ctx. bind tokens — so a bad entry fails on load, not at run time. Review target #1: is this the right contract, and is the typed-model-vs-loose-dict call right?

op.op: op
for op in (
PlatformOp(
op="find_capabilities",

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The AI agent says: find_capabilities emission (THE deferred item). This catalog entry is what makes find_capabilities agent-usable: it resolves to a direct call to POST /api/tools/discover (migrating off the server-side /tools/call tools.agenta.* dispatch, which is left in place for now). Its description + input schema are duplicated from api/.../discovery.py because the SDK must not import the API — a drift risk to watch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the drift risk is problematic. we need to solve this at some point. let's add an issue for this in github

method="POST",
path="/api/workflows/revisions/commit",
input_schema=_COMMIT_REVISION_INPUT_SCHEMA,
bind={"workflow_revision.workflow_variant_id": "$ctx.workflow.variant.id"},

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The AI agent says: The self-update bind. commit_revision binds workflow_revision.workflow_variant_id$ctx.workflow.variant.id (#4892's run-context token). resolved_input_schema() strips this field from the model-visible schema; to_call() emits it as call.context, so the runner fills it from runContext at dispatch and the model can never target a different variant. Paired with default_needs_approval=True below → gated + self-only. Review target #2.

name=op.op,
description=op.description,
input_schema=op.resolved_input_schema(),
call=op.to_call(),

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The AI agent says: The resolver. Each op becomes a CallbackToolSpec carrying a direct call (op.to_call()) with the bound fields stripped from op.resolved_input_schema(). It reuses PlatformConnection for the shared ToolCallback (the origin the runner resolves the relative path against) — no new HTTP plumbing. The config's needs_approval/permission override the catalog default; unset falls back to the per-op default (above).

@mmabrouk mmabrouk marked this pull request as ready for review June 28, 2026 02:53
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request python Pull requests that update Python code tests labels Jun 28, 2026
@mmabrouk mmabrouk changed the base branch from feat/direct-call-tools-runcontext to big-agents June 28, 2026 02:56
@mmabrouk mmabrouk merged commit 7521444 into big-agents Jun 28, 2026
16 checks passed
@mmabrouk mmabrouk mentioned this pull request Jun 28, 2026
12 tasks
description: str = Field(
min_length=1, description="Model-facing description (SDK-owned)."
)
method: Literal["GET", "POST"]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about put delete and other methods


model_config = ConfigDict(extra="forbid", frozen=True)

op: str = Field(

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not use abbreviations whenever is possible unless widely used (add it as a memory)

# Self-targeting fields the runner fills server-side from run context: a dotted body path on the
# endpoint's request -> a ``$ctx.<key>`` token. These are stripped from the model-visible schema
# and emitted as ``call.context`` so the model supplies only the payload and can never retarget.
bind: Dict[str, str] = Field(default_factory=dict)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable naming here is not good, the names should be clear

@mmabrouk

Copy link
Copy Markdown
Member Author

🤖 The AI agent says:

Triaged your four review comments. One is a contained fix (PR #4906); three are
interface/design changes filed as issues.

Your comment Outcome
bind / "the variable naming here is not good, the names should be clear" (on the catalog fields) Fixed in #4906 — renamed PlatformOp.bind -> context_bindings (SDK-internal only; the wire field stays context).
op / "let's not use abbreviations whenever is possible unless widely used (add it as a memory)" (here) Added to agent memory (prefer full words unless widely used). The op rename itself is public-edge (config arm + ~60 refs), so it is tracked with args_into in #4909 for one deliberate design-interfaces pass — and #4909 asks you to confirm whether op even clears the "widely used" bar before we touch it.
method / "what about put delete and other methods" (here) Filed as #4907. GET/POST-only is a deliberate, documented constraint at the wire level too ("constrained dispatcher, never an arbitrary HTTP client"), so widening is a design + security decision, not a quick widen. The issue lays out the decision points.
"the drift risk is problematic … let's add an issue for this in github" (here) Filed as #4908find_capabilities description + schema duplicated SDK-side vs api/.../discovery.py, with proposed directions (shared catalog type / build-time gen / fails-on-drift test).

Note: CodeRabbit never ran on #4893 (it skips non-default base branches), so there were no automated findings to address. I triggered it on the follow-up #4906.

PR #4906 is do-not-merge / needs-review.

mmabrouk added a commit that referenced this pull request Jun 28, 2026
[refactor] PR #4893 follow-up: rename platform-op `bind` to `context_bindings`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request needs-review Agent updated; awaiting Mahmoud's review python Pull requests that update Python code size:XL This PR changes 500-999 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant