Skip to content

feat(agent): client-tool round-trip backend (#4920)#4925

Draft
mmabrouk wants to merge 1 commit into
big-agentsfrom
feat/client-tool-roundtrip-4920
Draft

feat(agent): client-tool round-trip backend (#4920)#4925
mmabrouk wants to merge 1 commit into
big-agentsfrom
feat/client-tool-roundtrip-4920

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 28, 2026

Copy link
Copy Markdown
Member

Context

Some tools the agent uses require user interaction before they can complete. The OAuth connection flow is the main case: the agent asks the user to connect an integration (GitHub, Slack, etc.), the user finishes sign-in in the browser, and the run must resume with the result. The existing runner handled only approved/denied permission decisions and had no path for structured output from a client-side tool call.

This PR wires the backend side of that round-trip. The frontend slice (message-part dispatcher, connect widget, resume predicate, result UX, config refresh on commit) is a separate deliverable, coordinated with @ardaerzin.

Design: docs/design/agent-workflows/projects/agent-fe-roundtrip/design.md

Changes

TS runner (services/agent/):

  • responder.tsResponder gains onClientTool(req). ApprovalDecisions now holds { output: unknown } in addition to permission decisions. parkedCallKey replaces approvalKey everywhere (the old name implied approval-only semantics).
  • permissions.ts — Before dispatching a tool, checks spec.kind === "client". If so, emits interaction_request(kind: "client_tool") and parks via responder.onClientTool. The tool call stays parked until the frontend sends a result.
  • dispatch.ts — Client tools no longer throw. They return a {type: "client_tool_pending"} signal; the permission responder owns the park.

SDK (sdks/python/):

  • adapters/vercel/messages.pyTOOL_OUTPUT_AVAILABLE_TYPES is now a set containing both "tool-output-available" and "TOOL_OUTPUT_AVAILABLE". The previous check was case-exact and dropped uppercase variants from some clients.
  • agents/platform/workflow.py — Registers __ag__request_connection as a hard-coded client tool that resolves to ClientToolSpec(kind="client", render={"kind": "connect"}). Agents that include this workflow as a tool get the connect affordance without any backend call.

API:

  • core/workflows/static_catalog.py — Registers __ag__request_connection under URI client:tool:request_connection:v0 alongside the getting-started skill.
  • apis/fastapi/tools/router.py and workflows/router.py — Both commit_revision paths now emit a one-way data-committed-revision event {variantId, revisionId, version} after a successful commit. The frontend uses this to refresh the build-kit state without a separate poll.

Scope / risk

This is the backend slice only. The interaction loop is incomplete until the frontend dispatches client_tool interaction events and sends results back. Until that lands, a kind: "client" tool call will emit the event and park but never resume. The agent will appear to hang at that point.

The _emit_data_event helper in both API routers resolves request.state.emit with a fallback to emit_event. If neither attribute exists (e.g., outside a streaming context), it exits silently. Reviewers should confirm the emit attribute is reliably present on request.state in the tool-call and workflow-commit code paths, or that silent failure is the right behavior.

The parkedCallKey rename is purely cosmetic. It does not change behavior. The test files were updated to match.

Tests

  • Vitest: responder.test.ts, sandbox-agent-orchestration.test.ts, sandbox-agent-permissions.test.ts updated for parkedCallKey rename and expanded Responder interface.
  • Codex-reported: 44 TS tests across 3 files, all passing. ruff format+check clean on API + SDK. tsc --noEmit clean.

How to QA

Prerequisites: local dev stack with the agent sidecar running. The frontend slice is not yet merged, so the round-trip cannot be exercised end-to-end. This PR can be verified at the unit-test level only.

Automated tests:

cd services/agent && npx vitest run tests/unit/responder.test.ts tests/unit/sandbox-agent-permissions.test.ts
cd api && uv run python -m pytest oss/tests/pytest/ -k "commit_revision" -v

What to verify: Run the TS tests and confirm the parkedCallKey rename did not break any assertion. Confirm the API router tests still pass after the _emit_data_event addition.

Edge cases: The _emit_data_event call in the tools router reads outputs as a dict and exits if variantId/revisionId/version are absent. Confirm a commit_revision call that returns a non-dict output does not raise.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

@vercel

vercel Bot commented Jun 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 29, 2026 12:08pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 03fe9b8e-5ee4-432b-abfb-5699fafb4d63

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.74% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely captures the main backend change: client-tool round-trip support in the agent stack.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description clearly matches the backend client-tool round-trip, workflow, SDK, and event-emission changes in the diff.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/client-tool-roundtrip-4920

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mmabrouk mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 28, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

@ardaerzin — this is your counterpart for the FE round-trip design.

The backend is done (see PR description). Your frontend slice needs:

  1. Message-part dispatcher — when the runner emits an interaction_request with kind: "client_tool", surface it to the frontend message stream so the browser can handle it.
  2. Connect widget — render a UI element for render: {kind: "connect"} tool requests (the request_connection tool).
  3. Resume predicate — isClientToolResult(part) to detect when the user has fulfilled a client tool call and the runner should resume.
  4. Result UX — display the fulfilled tool output inline in the chat.
  5. Config/build-kit refresh — listen for data-committed-revision events (backend now emits them on every successful commit_revision) to refresh the active variant/revision without a full reload.

The design file has the full detail: docs/design/agent-workflows/projects/agent-fe-roundtrip/design.md (section "Work split -> Arda (FE)").

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
services/agent/tests/unit/sandbox-agent-permissions.test.ts (1)

36-38: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add direct coverage for the client-tool branch.

These stubs keep the expanded Responder interface satisfied, but the new attachPermissionResponder path should also assert a request with toolCall.spec.kind === "client" emits kind: "client_tool", parks without replying, and maps deny/output outcomes through clientToolReply().

Also applies to: 108-110, 147-149


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8737fc66-7334-4508-82bf-da22b81679d9

📥 Commits

Reviewing files that changed from the base of the PR and between ebc4ec1 and be5bd45.

📒 Files selected for processing (11)
  • api/oss/src/apis/fastapi/tools/router.py
  • api/oss/src/apis/fastapi/workflows/router.py
  • api/oss/src/core/workflows/static_catalog.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
  • sdks/python/agenta/sdk/agents/platform/workflow.py
  • services/agent/src/engines/sandbox_agent/permissions.ts
  • services/agent/src/responder.ts
  • services/agent/src/tools/dispatch.ts
  • services/agent/tests/unit/responder.test.ts
  • services/agent/tests/unit/sandbox-agent-orchestration.test.ts
  • services/agent/tests/unit/sandbox-agent-permissions.test.ts

Comment on lines +1372 to +1386
async def _emit_data_event(
*,
request: Request,
name: str,
data: dict,
) -> None:
emit = getattr(request.state, "emit", None) or getattr(
request.state, "emit_event", None
)
if not callable(emit):
return

result = emit({"type": "data", "name": name, "data": data})
if isawaitable(result):
await result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Don’t let UI event delivery fail a successful tool call.

If the request-state emitter throws here, _call_workflow_tool() returns an error even though the workflow already ran and produced outputs. This helper should be best-effort for the same reason as the workflow router variant.

Comment on lines +1282 to +1285
await _emit_committed_revision_data_event(
request=request,
workflow_revision=workflow_revision,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Emit the same event from the normal commit endpoint.

This only covers create_workflow_revision(). The regular commit path at Line 1500 still returns without a committed-revision event, so follow-up commits will not trigger the frontend revision refresh that this PR is adding.

Minimal follow-up
# In commit_workflow_revision(), after invalidate_cache(...)
await _emit_committed_revision_data_event(
    request=request,
    workflow_revision=workflow_revision,
)

Comment on lines +1932 to +1946
async def _emit_data_event(
*,
request: Request,
name: str,
data: dict,
) -> None:
emit = getattr(request.state, "emit", None) or getattr(
request.state, "emit_event", None
)
if not callable(emit):
return

result = emit({"type": "data", "name": name, "data": data})
if isawaitable(result):
await result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make data-event emission best-effort.

If request.state.emit fails here after the revision was already committed, the API turns a successful mutation into a 5xx and invites duplicate retries. Swallow/log emitter failures instead of letting them escape the request.

Comment on lines 97 to +105
if (
tool_name is None
and ptype.startswith("tool-")
and ptype != TOOL_OUTPUT_AVAILABLE
and ptype not in TOOL_OUTPUT_AVAILABLE_TYPES
):
tool_name = ptype[len("tool-") :]

blocks: List[ContentBlock] = []
if ptype != TOOL_OUTPUT_AVAILABLE or "input" in part:
if ptype not in TOOL_OUTPUT_AVAILABLE_TYPES or "input" in part:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Don’t synthesize tool names for generic result parts.

Line 100 only exempts tool-output-available. tool-output-error and tool-output-denied still fall through the tool- prefix path, so parts emitted by sdks/python/agenta/sdk/agents/adapters/vercel/stream.py:137-157 without toolName get parsed as tool_name="output-error" / "output-denied" and also produce a bogus tool_call at Line 105. That breaks the documented neutral tool_result mapping for non-success tool results.

Suggested fix
-TOOL_OUTPUT_AVAILABLE_TYPES = {TOOL_OUTPUT_AVAILABLE, "TOOL_OUTPUT_AVAILABLE"}
+TOOL_OUTPUT_AVAILABLE_TYPES = {TOOL_OUTPUT_AVAILABLE, "TOOL_OUTPUT_AVAILABLE"}
+GENERIC_TOOL_RESULT_TYPES = {
+    *TOOL_OUTPUT_AVAILABLE_TYPES,
+    "tool-output-error",
+    "tool-output-denied",
+}
@@
-    if (
-        tool_name is None
-        and ptype.startswith("tool-")
-        and ptype not in TOOL_OUTPUT_AVAILABLE_TYPES
-    ):
+    if (
+        tool_name is None
+        and ptype.startswith("tool-")
+        and ptype not in GENERIC_TOOL_RESULT_TYPES
+    ):
         tool_name = ptype[len("tool-") :]
@@
-    if ptype not in TOOL_OUTPUT_AVAILABLE_TYPES or "input" in part:
+    if ptype not in GENERIC_TOOL_RESULT_TYPES or "input" in part:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (
tool_name is None
and ptype.startswith("tool-")
and ptype != TOOL_OUTPUT_AVAILABLE
and ptype not in TOOL_OUTPUT_AVAILABLE_TYPES
):
tool_name = ptype[len("tool-") :]
blocks: List[ContentBlock] = []
if ptype != TOOL_OUTPUT_AVAILABLE or "input" in part:
if ptype not in TOOL_OUTPUT_AVAILABLE_TYPES or "input" in part:
if (
tool_name is None
and ptype.startswith("tool-")
and ptype not in GENERIC_TOOL_RESULT_TYPES
):
tool_name = ptype[len("tool-") :]
blocks: List[ContentBlock] = []
if ptype not in GENERIC_TOOL_RESULT_TYPES or "input" in part:

Comment on lines +115 to +124
def _is_request_connection_workflow(tool_config: ReferenceToolConfig) -> bool:
workflow = getattr(tool_config, "workflow", None)
if getattr(workflow, "slug", None) == REQUEST_CONNECTION_WORKFLOW_SLUG:
return True

call_ref = tool_config.call_ref
return (
call_ref == f"workflow.variant.{REQUEST_CONNECTION_WORKFLOW_SLUG}"
or call_ref.startswith(f"workflow.variant.{REQUEST_CONNECTION_WORKFLOW_SLUG}.")
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Handle environment-scoped request-connection references too.

Line 116 is checking tool_config.workflow.slug, but ReferenceToolConfig exposes slug directly. As written, the special case only matches workflow.variant... call refs, so ref_by="environment" tools targeting __ag__request_connection resolve as ordinary callback tools and bypass the client-tool flow.

Suggested fix
 def _is_request_connection_workflow(tool_config: ReferenceToolConfig) -> bool:
-    workflow = getattr(tool_config, "workflow", None)
-    if getattr(workflow, "slug", None) == REQUEST_CONNECTION_WORKFLOW_SLUG:
+    if tool_config.slug == REQUEST_CONNECTION_WORKFLOW_SLUG:
         return True

     call_ref = tool_config.call_ref
     return (
         call_ref == f"workflow.variant.{REQUEST_CONNECTION_WORKFLOW_SLUG}"
         or call_ref.startswith(f"workflow.variant.{REQUEST_CONNECTION_WORKFLOW_SLUG}.")
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _is_request_connection_workflow(tool_config: ReferenceToolConfig) -> bool:
workflow = getattr(tool_config, "workflow", None)
if getattr(workflow, "slug", None) == REQUEST_CONNECTION_WORKFLOW_SLUG:
return True
call_ref = tool_config.call_ref
return (
call_ref == f"workflow.variant.{REQUEST_CONNECTION_WORKFLOW_SLUG}"
or call_ref.startswith(f"workflow.variant.{REQUEST_CONNECTION_WORKFLOW_SLUG}.")
)
def _is_request_connection_workflow(tool_config: ReferenceToolConfig) -> bool:
if tool_config.slug == REQUEST_CONNECTION_WORKFLOW_SLUG:
return True
call_ref = tool_config.call_ref
return (
call_ref == f"workflow.variant.{REQUEST_CONNECTION_WORKFLOW_SLUG}"
or call_ref.startswith(f"workflow.variant.{REQUEST_CONNECTION_WORKFLOW_SLUG}.")
)

Comment on lines +223 to +228
private lookupClientTool(request: ClientToolRequest): { found: boolean; output?: unknown } {
for (const key of clientToolRequestKeys(request)) {
if (this.decisions.has(key)) {
const output = this.decisions.get(key);
if (!isPermissionDecision(output)) return { found: true, output };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Disambiguate permission decisions from client-tool outputs.

ApprovalDecisions now stores arbitrary client-tool outputs, but parkedCallResultOf() coerces any { approved: boolean } output into "allow"/"deny", and lookupClientTool() then drops stored "allow"/"deny" values as permission decisions. A valid client-tool output like { approved: false } or "deny" will not resume correctly. Store tagged entries, e.g. permission vs client-tool result, instead of using raw sentinel values.

Also applies to: 345-367

@mmabrouk

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@mmabrouk

Copy link
Copy Markdown
Member Author

🤖 The AI agent says:

What to review on this PR:

  1. _emit_data_event silent-failure in the tools and workflows routers. The helper exits silently if request.state.emit is absent. Confirm emit is present on request.state in every code path that calls commit_revision (both the tools router and the workflows router), or document the expected silent-failure contract.

  2. Park-and-resume gap. Until the frontend slice lands, a kind: "client" tool call will park and never resume. The agent will hang silently. The PR body calls this out, but reviewers should confirm there is a timeout or error path in the runner that eventually unparks (or errors) a stalled client-tool call rather than blocking the run indefinitely.

  3. TOOL_OUTPUT_AVAILABLE_TYPES set (messages.py). The change from a string equality check to a set membership check is small but widens what qualifies as a tool-output message. Confirm no other message type name could accidentally match one of these two strings.

  4. _is_request_connection_workflow in workflow.py. The function checks both workflow.slug attribute and two call_ref string patterns. Confirm the call_ref patterns are exhaustive and won't miss a variant-pinned reference.

RUNNER: add onClientTool to Responder; detect client specs in
permissions.ts and emit interaction_request(kind=client_tool) with park;
remove the throw in dispatch.ts (return pending signal instead);
rename approvalKey -> parkedCallKey everywhere; generalize
extractApprovalDecisions to handle both {approved} and structured output.

SDK: widen tool-output-available parsing in messages.py; add
__ag__request_connection static workflow resolving to ClientToolSpec
(kind=client, render={kind:connect}) in platform/workflow.py.

API: register __ag__request_connection in static_catalog; emit
data-committed-revision event after successful commit_revision in both
tools/router.py and workflows/router.py.

All checks: ruff format+check (API+SDK), tsc --noEmit, vitest (44 tests pass).

Frontend (message-part dispatcher, connect widget, resume predicate,
result UX, config refresh) is NOT included — owned by @ardaerzin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-review Agent updated; awaiting Mahmoud's review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant