feat(agent): enforce sidecar trust + disable unenforceable sandbox boundaries by mmabrouk · Pull Request #4831 · Agenta-AI/agenta

mmabrouk · 2026-06-24T19:24:13Z

Context

The agent runner sidecar (services/agent/) exposed two trust gaps that the design doc on this PR analyzed. First, POST /run carries plaintext provider secrets and reusable bearer tokens, yet the sidecar accepted any caller on any interface with no authentication. Second, the SandboxPermission boundary declared capabilities the runner does not actually enforce: a network policy on the local sandbox (which has no egress control), a filesystem jail (applied by no backend), and stdio MCP servers that run unconfined on the runner host (the same execution bypass that already had code execution removed). The boundary looked stronger than it was, and a run could proceed believing a jail held when it did not.

This PR is now the implementation of the decided near-term scope from the design doc (steps 1 and 2 of Part 1, plus the Part 2 not-implemented decisions). It stays inside the runner (services/agent/src/**). mTLS, scoped tokens, and payload encryption remain deferred.

Changes

1. Network isolation + optional /run token (server.ts).
The server now binds to AGENTA_AGENT_RUNNER_HOST (default 127.0.0.1, never 0.0.0.0), so by default the sidecar is reachable only from the same host as the co-located Python service. Set it to the private pod/internal interface in Kubernetes/Compose and never publish the port. An optional shared token gates /run: when AGENTA_AGENT_RUNNER_TOKEN is set, a request must present it (as Authorization: Bearer <token> or X-Agenta-Runner-Token: <token>, constant-time compare) or get a 401. The check is default-off, so loopback/co-located deployments are unaffected. /health stays open for liveness probes.

2. Error-on-unimplemented sandbox_permission (run-plan.ts).
A declared boundary the runner cannot enforce now fails loudly the way code tools do (tools/code.ts), instead of being silently accepted under best_effort:

A restricted network policy on the local sandbox returns LOCAL_NETWORK_UNSUPPORTED_MESSAGE (the local sidecar has no per-run egress control), regardless of enforcement. Daytona still enforces it.
Any filesystem value (on any backend) returns FILESYSTEM_UNSUPPORTED_MESSAGE, since no backend applies a filesystem jail.

best_effort is no longer the escape hatch for these two. The Daytona strict-mode guard against runner-host (gateway/code) tools is kept.

3. stdio MCP disabled, the way code execution was (tools/mcp-bridge.ts, tools/mcp-server.ts, engines/sandbox_agent/mcp.ts, run-plan.ts).
A stdio MCP server runs an arbitrary process on the runner host, outside the sandbox boundary. Until the security is fixed, the wire shapes (McpServerConfig, the mcpServers request field) stay, but delivery is gated by a single named constant:

MCP_UNSUPPORTED_MESSAGE = "MCP servers are not supported by the sidecar."

buildToolMcpServers and toAcpMcpServers throw it, mcp-server.ts is reduced to a refusing stub, and run-plan.ts rejects any run that carries a stdio MCP server up front.

What disabling MCP removes: non-Pi harnesses (e.g. Claude) take custom tools only over MCP, so they can no longer receive custom tools at all, and user-declared stdio MCP servers are refused. Pi is unaffected. Pi tools are delivered natively through the Pi extension, not MCP, and gateway/callback tools still POST back through /tools/call.

protocol.ts is not edited. The gates are runtime behavior, not new wire fields, and the file is owned by the harness-rename agent this cycle. Gateway/callback tools are unchanged; they are governed by Layer-3 tool-permission, not the Layer-2 sandbox_permission boundary.

Scope / risk

MCP support for non-Pi harnesses (e.g. Claude) is disabled in this PR. A Claude agent that previously worked with custom tools will fail with the MCP-unsupported error. Pi-based agents with gateway/callback tools are unaffected. The loopback-only bind is the most visible deployment change: any operator who was testing by hitting the runner port directly from another host will lose access unless they set AGENTA_AGENT_RUNNER_HOST.

Tests / notes

pnpm test (168 passing) and pnpm run typecheck in services/agent, prettier clean.
New coverage: the /run token gate (default-off accept, 401 without/with a wrong token, accept via both header forms, /health open); the local-network and filesystem not-implemented errors (both enforcement values); stdio MCP refused at plan time and at delivery; a non-Pi run carrying a custom tool now surfaces the MCP-unsupported error.
Docs synced in the same PR: the project README.md/status.md (decisions marked implemented), the runner README.md MCP/tools section. The interface inventory pages (runner-to-mcp-server.md, sandbox-permission.md) are owned by a sibling lane; a hand-off to fold the disabled/error state into the inventory is recorded on the coordination board.

How to QA

Prerequisites: Local dev stack with the agent sidecar running (services/agent/ Node runner). No AGENTA_AGENT_RUNNER_TOKEN set (default-off).

Steps (network isolation):

Confirm the runner only listens on 127.0.0.1:

ss -tlnp | grep 3001
# Expected: 127.0.0.1:3001

Confirm /health responds without a token:

curl http://127.0.0.1:3001/health
# Expected: 200

Steps (token gate):
3. Set AGENTA_AGENT_RUNNER_TOKEN=test-secret and restart the runner.
4. Send a /run request without the token:

curl -X POST http://127.0.0.1:3001/run \
  -H "Content-Type: application/json" -d '{}' -v
# Expected: 401

Send the same request with the token:

curl -X POST http://127.0.0.1:3001/run \
  -H "Authorization: Bearer test-secret" \
  -H "Content-Type: application/json" -d '{}' -v
# Expected: not 401 (likely 400 from missing fields, but the auth gate passed)

Steps (sandbox_permission fail-fast):
6. Send a /run request with a restricted network policy on the local backend:

curl -X POST http://127.0.0.1:3001/run \
  -H "Content-Type: application/json" \
  -d '{"harness":"pi_core","sandbox_permission":{"network":"restricted"},"messages":[]}' -v
# Expected: error response mentioning local network unsupported

Steps (stdio MCP disabled):
7. Send a /run request that includes an stdio MCP server:

curl -X POST http://127.0.0.1:3001/run \
  -H "Content-Type: application/json" \
  -d '{"harness":"pi_core","mcp_servers":[{"name":"test","transport":"stdio","command":"echo"}],"messages":[]}' -v
# Expected: error response mentioning MCP servers are not supported

Expected result: Token gate blocks unauthorized requests and passes authorized ones. Local-network and filesystem policies fail loudly. stdio MCP servers are refused at plan time.

Automated tests:

cd services/agent && pnpm test

Edge cases: A Pi-based agent using gateway tools must still work (gateway tools are not MCP and are unaffected). A Claude agent using custom tools will now fail; confirm the error message is clear, not a silent hang.

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

vercel · 2026-06-24T19:24:19Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 25, 2026 6:29pm

coderabbitai · 2026-06-24T19:24:22Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 33d91be0-3a00-4dd2-b900-40cf5d4d9645

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR updates agent configuration and harness option naming, publishes harness-filtered model metadata, normalizes /inspect responses, and changes the agent sidecar to validate runner URI overrides, reject unsupported MCP and sandbox paths, and require optional /run authorization.

Changes

Harness-aware agent config and model picker

Layer / File(s)	Summary
AgentConfig and run-selection contract `sdks/python/agenta/sdk/agents/dtos.py`, `sdks/python/agenta/sdk/utils/types.py`, `services/oss/src/agent/app.py`, `web/packages/agenta-playground/src/state/execution/agentRequest.ts`, `docs/design/agent-workflows/documentation/`, `docs/design/agent-workflows/interfaces/`, `docs/design/agent-workflows/projects/sidecar-uri-config/*`	`AgentConfig` now carries `harness`, `uri`, and `permission_policy`, `harness_kwargs` replaces `harness_options`, and the related docs and request-shaping code now describe one config object instead of a separate `RunSelection`.
Inspect capabilities and picker UI `sdks/python/agenta/sdk/agents/capabilities.py`, `web/packages/agenta-entities/src/workflow/`, `web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/`, `sdks/python/oss/tests/pytest/unit/agents/*`, `web/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts`	The SDK publishes per-harness `models`, the workflow state exposes inspect metadata, and the agent config UI uses that metadata and vault secrets to render the unified provider/model/authentication controls.
Inspect docs and status `docs/design/agent-workflows/projects/agent-model-picker/*`, `docs/design/agent-workflows/documentation/agent-configuration.md`, `docs/design/agent-workflows/documentation/ports-and-adapters.md`	The design docs and narrative walkthroughs are updated to describe the inspect-driven model picker, the authentication toggle, and the new `harness_kwargs` naming.

Wire contract and inspect response

Layer / File(s)	Summary
/run wire schema export `sdks/python/agenta/sdk/agents/wire_models.py`, `sdks/python/agenta/sdk/utils/types.py`, `services/oss/src/agent/schemas.py`, `sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py`, `docs/design/agent-workflows/projects/wire-contract-schema/*`	`WireRunRequest` and `WireRunResult` define the `/run` wire shape, the SDK catalog exports their schemas, the agent outputs schema becomes keyed by `invoke` and `messages`, and the added tests/docs keep those schemas aligned.
Canonical `/inspect` response `sdks/python/agenta/sdk/models/workflows.py`, `sdks/python/agenta/sdk/decorators/routing.py`, `sdks/python/agenta/sdk/decorators/running.py`, `sdks/python/agenta/sdk/engines/running/utils.py`, `web/packages/agenta-entities/src/workflow/*`, `sdks/python/oss/tests/pytest/unit/test_inspect_response.py`, `services/oss/tests/pytest/unit/agent/test_builtin_uri_binding.py`	`WorkflowInspectResponse` now carries the resolved revision at `revision.schemas`, the router normalizes inspect responses to that shape, inspect metadata is registered by URI, and the frontend selectors and tests read that canonical structure.

Sidecar routing, authorization, and sandbox gates

Layer / File(s)	Summary
Runner URI routing and auth `services/oss/src/agent/config.py`, `services/oss/src/agent/app.py`, `services/agent/src/server.ts`, `services/agent/tests/unit/server.test.ts`, `hosting/docker-compose/*`	`/run` routing now prefers an allowlist-gated `uri` override, the sidecar can require a bearer token, and the compose files bind the runner host for cross-container access.
MCP and sandbox rejection `services/agent/src/tools/mcp-bridge.ts`, `services/agent/src/tools/mcp-server.ts`, `services/agent/src/engines/sandbox_agent/`, `services/agent/tests/unit/mcp*.test.ts`, `services/agent/tests/unit/sandbox-agent-run-plan.test.ts`, `services/agent/tests/unit/sandbox-agent-orchestration.test.ts`	stdio MCP delivery now throws `MCP_UNSUPPORTED_MESSAGE`, local filesystem and restricted network sandbox policies fail early, and the orchestration and unit tests reflect the unsupported paths.
Sidecar trust and URI docs `docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/`, `docs/design/agent-workflows/projects/sidecar-uri-config/`, `services/agent/README.md`	The design docs and service README now describe the allowlist-gated URI override, the updated trust model, and the revised enforcement matrix.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.66% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main changes: sidecar trust hardening and disabling unsupported sandbox boundaries.
Description check	✅ Passed	The description is on-topic and clearly matches the implemented runner security, sandbox enforcement, and MCP changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/sidecar-trust-and-sandbox-enforcement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mmabrouk · 2026-06-24T19:35:42Z

+
+Ordered roughly cheapest-to-strongest. They compose; this is not either/or.
+
+1. **Network isolation / localhost binding (make the implicit assumption real).**


Kept as decided near-term step 1; endorsement noted inline. (adcd126)

mmabrouk · 2026-06-24T19:37:45Z

+5. **Payload encryption of `secrets`** — held in reserve; only if a requirement forbids
+   plaintext secrets even under TLS.
+
+Rationale: steps 1–2 are config-only, ship today, and make the implicit assumption real and


I agree let's only implement 1 and 2 now

Scoped near-term work to steps 1 & 2 only; moved mTLS / short-lived scoped tokens / payload encryption to an explicit 'Later / deferred hardening' subsection. (adcd126)

mmabrouk · 2026-06-24T19:38:46Z

+  egress boundary is applied by the Daytona platform at sandbox creation. This is real
+  enforcement at the sandbox boundary.
+
+- **Local sandbox network egress — NOT enforceable.**


that is alright. it might make sense to return an error the way we do when a capability is not implemented if that variable is set for local

Documented: local sandbox errors when network is set (not-implemented-style, regardless of enforcement), mirroring the code.ts CODE_TOOL_UNSUPPORTED_MESSAGE gate / the typed *Unsupported*Errors in connections/errors.py. Code change is a separate task. (adcd126)

mmabrouk · 2026-06-24T19:39:20Z

+  restricted-network run on the local sandbox when `enforcement: "strict"` (fail loud), and
+  proceeds with a warning under `best_effort`.
+
+- **Filesystem boundary — DECLARED, enforced NOWHERE.**


again, should be an error if specified currently since it is not implemented

Documented: specifying filesystem errors (not implemented on any backend), same not-implemented pattern. (adcd126)

mmabrouk · 2026-06-24T19:41:44Z

+  on Daytona.**
+  Resolved `code` tools, gateway/callback tools, and stdio MCP servers do **not** run inside
+  the sandbox. They run in the **runner host process** via the relay
+  (`services/agent/src/tools/relay.ts`: `runCodeTool`, `callAgentaTool`, and the stdio MCP


we have remove code execution from the code. we should also disable the same way mcp servers implementation until we fix security issues. We simply put them as non implemented and remove the implemenattion in the sandbox sidecar like we did for tool calls.

for gateway it's alright. no need to do anything. that is not part of sandbox persmission, that is part of the permission (the other parameter that is about the sidecar and obviously there we should deal with these correctly

Verified code exec is already removed (runCodeTool throws). Documented the decision to disable + remove the stdio MCP sidecar impl (mcp-bridge.ts/mcp-server.ts/stdio plumbing) the same way, until security is fixed. Clarified gateway = Layer-3 tool-permission, NOT Layer-2 sandbox_permission, so no action there. Code change is a separate task. (adcd126)

mmabrouk · 2026-06-24T19:42:06Z

+  the opt-out that accepts the boundary is not a hard guarantee.
+
+- **Legacy in-process engine bypass.**
+  `backend: "pi"` (the legacy in-process `engines/pi.ts`) is not the sandbox-agent path and


we have i think another agent removing that. should be removed by the time you start working on this

Verified: engines/pi.ts is gone (only sandbox_agent.ts/sandbox_agent//skills.ts remain). Doc now presents the legacy in-process engine as removed/historical only. (adcd126)

…view Address the 6 author inline comments on the proposal: - Scope near-term work to steps 1-2 only (network isolation + optional /run token); defer mTLS, short-lived scoped tokens, and payload encryption to an explicit Later/deferred subsection. - Document the decided not-implemented error behavior: the local sandbox errors when a network policy is set, and any specified filesystem policy errors, both mirroring the existing code-tool unsupported gate (code.ts CODE_TOOL_UNSUPPORTED_MESSAGE) and the typed connection *Unsupported*Errors. - Document the decision to disable stdio MCP-server execution in the sidecar the same way code execution was removed (not-implemented gate + remove mcp-bridge.ts/mcp-server.ts/stdio plumbing), and clarify that gateway tools are Layer-3 tool-permission, not Layer-2 sandbox_permission, so they need no change. - Update the legacy in-process pi engine to removed/historical (engines/pi.ts is gone). Verified code state: code execution already removed, stdio MCP still present, pi.ts removed, protocol.ts network comment already corrected by A3. Docs-only; no runner code changed. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

mmabrouk · 2026-06-24T19:53:27Z

Revised the proposal per your 6 inline comments (commit adcd126, docs-only — no runner code changed). Mapping each comment to the change:

Network isolation / loopback binding (~L88) — "agree." Kept as the decided near-term step 1 and noted your endorsement inline.
Steps 1-2 only (~L168) — "only implement 1 and 2 now." Scoped the recommendation to step 1 (network isolation / loopback-or-in-cluster binding) and step 2 (optional shared /run token). Moved mTLS, short-lived scoped tokens, and payload encryption into an explicit "Later / deferred hardening" subsection — out of near-term scope.
Local network not enforceable (~L201) — "return an error the way we do when a capability is not implemented." Documented the decided behavior: when sandbox_permission.network is set on the local sandbox (which cannot enforce egress), the runner should error with a not-implemented-style message, regardless of enforcement (no longer a best_effort warn). Cited the existing pattern to mirror: services/agent/src/tools/code.ts's CODE_TOOL_UNSUPPORTED_MESSAGE throwing gate, and the typed *Unsupported*Errors in sdks/python/agenta/sdk/agents/connections/errors.py.
Filesystem declared, enforced nowhere (~L208) — "should be an error if specified." Same decision: a specified sandbox_permission.filesystem value should error (not implemented on any backend), same not-implemented pattern.
Code/gateway/MCP on the runner host (~L221).
- Code execution — VERIFIED already removed. runCodeTool (code.ts) throws CODE_TOOL_UNSUPPORTED_MESSAGE; interface kept, every path funnels through the throwing gate.
- MCP — decision documented. stdio MCP-server execution is still present in the sidecar (tools/mcp-bridge.ts + tools/mcp-server.ts). Documented the plan to disable it the same way code execution was: not-implemented gate + remove the sidecar implementation (mcp-bridge.ts / mcp-server.ts / stdio plumbing) until the security issues are fixed. (Code change is a separate task — another agent is editing the sidecar now; this PR documents only.)
- Layering clarified. Gateway tools need no action: they live in Layer-3 tool-permission (the sidecar tool-permission / HITL parameter, relay.ts resolvePermission), not Layer-2 sandbox_permission. The doc now states this distinction so gateway is not lumped into the sandbox boundary.
Legacy in-process pi engine (~L229) — "another agent removing that." Verified removed: services/agent/src/engines/pi.ts no longer exists (only sandbox_agent.ts / sandbox_agent/ / skills.ts remain; pi.ts shows deleted in the working tree). The doc now presents it as removed/historical only.

Also: the enforcement matrix + one-line summary were rewritten to reflect all the above, and the protocol.ts comment section was updated — A3 already applied the network half of the correction; two follow-up comment edits (filesystem→error, local-network→error) are recorded for whoever lands that behavior. status.md updated with a "Review decisions (PR #4831)" section and the decided-but-separate code tasks.

…view Address the 6 author inline comments on the proposal: - Scope near-term work to steps 1-2 only (network isolation + optional /run token); defer mTLS, short-lived scoped tokens, and payload encryption to an explicit Later/deferred subsection. - Document the decided not-implemented error behavior: the local sandbox errors when a network policy is set, and any specified filesystem policy errors, both mirroring the existing code-tool unsupported gate (code.ts CODE_TOOL_UNSUPPORTED_MESSAGE) and the typed connection *Unsupported*Errors. - Document the decision to disable stdio MCP-server execution in the sidecar the same way code execution was removed (not-implemented gate + remove mcp-bridge.ts/mcp-server.ts/stdio plumbing), and clarify that gateway tools are Layer-3 tool-permission, not Layer-2 sandbox_permission, so they need no change. - Update the legacy in-process pi engine to removed/historical (engines/pi.ts is gone). Verified code state: code execution already removed, stdio MCP still present, pi.ts removed, protocol.ts network comment already corrected by A3. Docs-only; no runner code changed. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

mmabrouk · 2026-06-24T21:23:06Z

Implementation landed (commit 227de27, runner-only, services/agent/src/**). This PR is now the design doc + its implementation. Stacked on feat/agent-contract-versioning-docs (#4829), so the diff is isolated to this work.

What changed, per the three decided scope items:

Network isolation + optional /run token (server.ts): binds to AGENTA_AGENT_RUNNER_HOST (default 127.0.0.1, never 0.0.0.0); optional AGENTA_AGENT_RUNNER_TOKEN gates /run (default OFF, Authorization: Bearer or X-Agenta-Runner-Token, constant-time compare, 401 on mismatch). /health stays open. mTLS / scoped tokens / payload encryption deferred.
Error-on-unimplemented sandbox_permission (run-plan.ts): a restricted network policy on the LOCAL sandbox → LOCAL_NETWORK_UNSUPPORTED_MESSAGE; any filesystem value (any backend) → FILESYSTEM_UNSUPPORTED_MESSAGE. Both unconditional (no longer a best_effort escape), mirroring the code.ts not-implemented gate. Daytona strict runner-host-tool guard kept.
stdio MCP disabled (tools/mcp-bridge.ts, tools/mcp-server.ts, engines/sandbox_agent/mcp.ts, run-plan.ts): MCP_UNSUPPORTED_MESSAGE is thrown by buildToolMcpServers + toAcpMcpServers, mcp-server.ts is a refusing stub, and run-plan.ts rejects any run carrying a stdio MCP server. Wire shapes (McpServerConfig, mcpServers) unchanged.

What disabling MCP removes: non-Pi harnesses (Claude) take tools only over MCP, so they can no longer receive custom tools, and user-declared stdio MCP servers are refused. Pi tools (native via the extension) and gateway/callback tools (/tools/call) are unaffected.

Out of scope, deliberately: protocol.ts not edited (gates are runtime behavior, not new wire fields; A3 owns the file). Gateway tools untouched (Layer-3 tool-permission). The GET /capabilities probe deferred. No sdks/python/** or web/** changes.

Tests: pnpm test 168 passing + pnpm run typecheck clean in services/agent, prettier clean. New coverage for the token gate, the two not-implemented errors, and MCP-disabled (plan-time + delivery).

Docs: project README.md/status.md mark the decisions implemented; runner README.md MCP/tools section synced. Interface inventory pages (runner-to-mcp-server.md, sandbox-permission.md) are a sibling lane's surface; a hand-off to fold the disabled/error state into the inventory is on the coordination board.

mmabrouk · 2026-06-24T21:25:02Z

@coderabbitai review

coderabbitai · 2026-06-24T21:25:08Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

github-actions · 2026-06-24T21:32:06Z

Railway Preview Environment


Status	Destroyed (PR closed)

Updated at 2026-06-25T18:37:44.419Z

…view Address the 6 author inline comments on the proposal: - Scope near-term work to steps 1-2 only (network isolation + optional /run token); defer mTLS, short-lived scoped tokens, and payload encryption to an explicit Later/deferred subsection. - Document the decided not-implemented error behavior: the local sandbox errors when a network policy is set, and any specified filesystem policy errors, both mirroring the existing code-tool unsupported gate (code.ts CODE_TOOL_UNSUPPORTED_MESSAGE) and the typed connection *Unsupported*Errors. - Document the decision to disable stdio MCP-server execution in the sidecar the same way code execution was removed (not-implemented gate + remove mcp-bridge.ts/mcp-server.ts/stdio plumbing), and clarify that gateway tools are Layer-3 tool-permission, not Layer-2 sandbox_permission, so they need no change. - Update the legacy in-process pi engine to removed/historical (engines/pi.ts is gone). Verified code state: code execution already removed, stdio MCP still present, pi.ts removed, protocol.ts network comment already corrected by A3. Docs-only; no runner code changed. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

mmabrouk · 2026-06-25T10:44:13Z

Added: bind the agent sidecar on 0.0.0.0 in compose (stream-error fix)

This PR's trust model defaults the sidecar runner bind to 127.0.0.1 (same-host only). In docker-compose the Python services container and the sandbox-agent sidecar are separate containers, so 127.0.0.1 makes the sidecar unreachable from services. Every agent run failed with All connection attempts failed, and the playground chat showed a "Stream error" — the user could not exercise the agent at all. The sidecar's healthcheck hits 127.0.0.1 inside its own container, so it passed and masked the break.

Fix: set AGENTA_AGENT_RUNNER_HOST=0.0.0.0 (overridable) on the sandbox-agent service in the EE dev, OSS dev, and OSS gh compose files, so the co-located services container reaches it at sandbox-agent:8765.

Committed to this lane (docs/sidecar-trust-and-sandbox-enforcement) because this PR introduced the 127.0.0.1 default without the compose counterpart. Verified live on the dev stack: after recreating the sidecar with the override, the agent chat streams end to end again ("Hello, dear friend" with usage). Files: hosting/docker-compose/{ee/docker-compose.dev.yml, oss/docker-compose.dev.yml, oss/docker-compose.gh.yml}.

mmabrouk · 2026-06-25T10:44:15Z

@coderabbitai review

coderabbitai · 2026-06-25T10:44:23Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

services/oss/src/agent/app.py (1)
219-225: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Validate the runner URI before resolving tools or credentials.

select_backend(agent_config) is the allowlist gate, but it currently runs after tool/MCP resolution and possible model credential resolution. A disallowed agent.uri should fail before those external/secret-adjacent operations.
Proposed fix
     params = parameters or {}
 
     agent_config = AgentConfig.from_params(params, defaults=_default_agent_config())
+    backend = select_backend(agent_config)
 
     msgs = to_messages(messages or (inputs or {}).get("messages") or [])
@@
-    harness = make_harness(
-        agent_config.harness, Environment(select_backend(agent_config))
-    )
+    harness = make_harness(agent_config.harness, Environment(backend))
Also applies to: 258-260
docs/design/agent-workflows/interfaces/public-edge/workflow-invoke.md (1)
26-39: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Stale sandbox run-selection reference; should be uri.

This page still lists the agent run-selection fields as harness, sandbox, permission_policy, but the rest of this PR removes the per-run sandbox selector and replaces it with uri (see agent-config-schema.md: "uri replaced the old sandbox selector" and agent-service-handler.md lists harness/uri/permission_policy). select_backend now sends sandbox only as a constant _DEFAULT_SANDBOX, not as an authored selection. Update the example comment and prose to keep the doc set consistent.
📝 Suggested doc fix
-    "parameters": { "agent": { /* config, incl. harness, sandbox, permission_policy */ } }
+    "parameters": { "agent": { /* config, incl. harness, uri, permission_policy */ } }
-`parameters` carries the agent config under `agent`, and the run-selection fields (`harness`,
-`sandbox`, `permission_policy`) live inside that same `agent` object. The handler reads the
+`parameters` carries the agent config under `agent`, and the run-selection fields (`harness`,
+`uri`, `permission_policy`) live inside that same `agent` object. The handler reads the

🧹 Nitpick comments (1)

sdks/python/agenta/sdk/engines/running/utils.py (1)

544-566: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

Shallow copy only protects top-level keys.

dict(meta) copies only the top level, so nested values (e.g. the harness_capabilities document) remain shared between the caller, the registry, and every retrieve_meta consumer. The docstring's "later mutation of the caller's dict cannot leak into the registry" guarantee therefore doesn't hold for nested mutations. Current callers treat meta as read-only, so this is non-blocking, but copy.deepcopy on both store and read would make the isolation guarantee real.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 32e73fdd-96c7-4369-ac42-916d90ae4e7d

📥 Commits

Reviewing files that changed from the base of the PR and between 9354ea4 and f0225ca.

📒 Files selected for processing (87)

docs/design/agent-workflows/documentation/adapters/agenta.md
docs/design/agent-workflows/documentation/adapters/pi.md
docs/design/agent-workflows/documentation/agent-configuration.md
docs/design/agent-workflows/documentation/agent-template.md
docs/design/agent-workflows/documentation/architecture.md
docs/design/agent-workflows/documentation/ground-truth.md
docs/design/agent-workflows/documentation/ports-and-adapters.md
docs/design/agent-workflows/documentation/running-the-agent.md
docs/design/agent-workflows/interfaces/cross-service/service-to-agent-runner.md
docs/design/agent-workflows/interfaces/in-service/agent-service-handler.md
docs/design/agent-workflows/interfaces/in-service/harness-adapters.md
docs/design/agent-workflows/interfaces/in-service/neutral-runtime-dtos.md
docs/design/agent-workflows/interfaces/public-edge/agent-config-schema.md
docs/design/agent-workflows/interfaces/public-edge/agent-messages.md
docs/design/agent-workflows/interfaces/public-edge/workflow-inspect.md
docs/design/agent-workflows/interfaces/public-edge/workflow-invoke.md
docs/design/agent-workflows/projects/agent-config-structure-cleanup/status.md
docs/design/agent-workflows/projects/agent-model-picker/README.md
docs/design/agent-workflows/projects/agent-model-picker/context.md
docs/design/agent-workflows/projects/agent-model-picker/plan.md
docs/design/agent-workflows/projects/agent-model-picker/research.md
docs/design/agent-workflows/projects/agent-model-picker/status.md
docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/README.md
docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/status.md
docs/design/agent-workflows/projects/sidecar-uri-config/README.md
docs/design/agent-workflows/projects/sidecar-uri-config/context.md
docs/design/agent-workflows/projects/sidecar-uri-config/plan.md
docs/design/agent-workflows/projects/sidecar-uri-config/research.md
docs/design/agent-workflows/projects/sidecar-uri-config/security.md
docs/design/agent-workflows/projects/sidecar-uri-config/status.md
docs/design/agent-workflows/projects/wire-contract-schema/README.md
docs/design/agent-workflows/projects/wire-contract-schema/status.md
hosting/docker-compose/ee/docker-compose.dev.yml
hosting/docker-compose/oss/docker-compose.dev.yml
hosting/docker-compose/oss/docker-compose.gh.yml
sdks/python/agenta/__init__.py
sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/claude_settings.py
sdks/python/agenta/sdk/agents/adapters/harnesses.py
sdks/python/agenta/sdk/agents/capabilities.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/utils/wire.py
sdks/python/agenta/sdk/agents/wire_models.py
sdks/python/agenta/sdk/decorators/routing.py
sdks/python/agenta/sdk/decorators/running.py
sdks/python/agenta/sdk/engines/running/utils.py
sdks/python/agenta/sdk/models/workflows.py
sdks/python/agenta/sdk/utils/types.py
sdks/python/oss/tests/pytest/unit/agents/adapters/test_claude_settings.py
sdks/python/oss/tests/pytest/unit/agents/connections/test_capabilities.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py
sdks/python/oss/tests/pytest/unit/test_inspect_response.py
services/agent/README.md
services/agent/src/engines/sandbox_agent/mcp.ts
services/agent/src/engines/sandbox_agent/run-plan.ts
services/agent/src/server.ts
services/agent/src/tools/mcp-bridge.ts
services/agent/src/tools/mcp-server.ts
services/agent/tests/unit/mcp-servers.test.ts
services/agent/tests/unit/sandbox-agent-orchestration.test.ts
services/agent/tests/unit/sandbox-agent-run-plan.test.ts
services/agent/tests/unit/server.test.ts
services/agent/tests/unit/tool-bridge.test.ts
services/oss/src/agent/app.py
services/oss/src/agent/config.py
services/oss/src/agent/schemas.py
services/oss/tests/pytest/unit/agent/test_builtin_uri_binding.py
services/oss/tests/pytest/unit/agent/test_default_agent_config.py
services/oss/tests/pytest/unit/agent/test_select_backend.py
web/oss/src/components/AgentChatSlice/assets/transport.ts
web/packages/agenta-entities/src/workflow/api/api.ts
web/packages/agenta-entities/src/workflow/index.ts
web/packages/agenta-entities/src/workflow/state/index.ts
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts
web/packages/agenta-entities/src/workflow/state/store.ts
web/packages/agenta-entities/tests/unit/inspectResponseSchemaResolution.test.ts
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/ClaudePermissionsControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts
web/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts
web/packages/agenta-playground/src/state/execution/agentRequest.ts
web/packages/agenta-playground/src/state/execution/selectors.ts
web/packages/agenta-playground/tests/unit/agentMode.test.ts
web/packages/agenta-playground/tests/unit/agentRequest.test.ts

💤 Files with no reviewable changes (1)

sdks/python/agenta/init.py

✅ Files skipped from review due to trivial changes (24)

docs/design/agent-workflows/projects/agent-model-picker/README.md
docs/design/agent-workflows/interfaces/public-edge/agent-messages.md
docs/design/agent-workflows/documentation/ground-truth.md
sdks/python/oss/tests/pytest/unit/agents/adapters/test_claude_settings.py
services/agent/src/tools/mcp-server.ts
web/packages/agenta-entities/src/workflow/index.ts
docs/design/agent-workflows/projects/agent-config-structure-cleanup/status.md
docs/design/agent-workflows/interfaces/cross-service/service-to-agent-runner.md
sdks/python/agenta/sdk/agents/utils/wire.py
docs/design/agent-workflows/projects/sidecar-uri-config/README.md
docs/design/agent-workflows/interfaces/in-service/neutral-runtime-dtos.md
docs/design/agent-workflows/projects/sidecar-uri-config/context.md
docs/design/agent-workflows/documentation/adapters/agenta.md
docs/design/agent-workflows/interfaces/public-edge/workflow-inspect.md
docs/design/agent-workflows/documentation/agent-template.md
docs/design/agent-workflows/interfaces/in-service/harness-adapters.md
docs/design/agent-workflows/projects/sidecar-uri-config/status.md
docs/design/agent-workflows/projects/sidecar-uri-config/plan.md
services/agent/README.md
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/ClaudePermissionsControl.tsx
docs/design/agent-workflows/projects/agent-model-picker/context.md
docs/design/agent-workflows/projects/agent-model-picker/plan.md
docs/design/agent-workflows/projects/agent-model-picker/status.md
docs/design/agent-workflows/projects/wire-contract-schema/status.md

🚧 Files skipped from review as they are similar to previous changes (20)

web/packages/agenta-entities/tests/unit/inspectResponseSchemaResolution.test.ts
web/packages/agenta-entities/src/workflow/state/index.ts
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts
sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py
services/agent/tests/unit/sandbox-agent-orchestration.test.ts
sdks/python/oss/tests/pytest/unit/test_inspect_response.py
sdks/python/oss/tests/pytest/unit/agents/connections/test_capabilities.py
services/oss/src/agent/schemas.py
sdks/python/agenta/sdk/models/workflows.py
sdks/python/agenta/sdk/agents/capabilities.py
services/agent/src/engines/sandbox_agent/mcp.ts
web/packages/agenta-entities/src/workflow/api/api.ts
services/agent/tests/unit/sandbox-agent-run-plan.test.ts
sdks/python/agenta/sdk/decorators/routing.py
services/agent/tests/unit/server.test.ts
services/agent/src/tools/mcp-bridge.ts
services/agent/src/engines/sandbox_agent/run-plan.ts
sdks/python/agenta/sdk/agents/wire_models.py
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts
web/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts

coderabbitai · 2026-06-25T10:57:31Z

        environment:
            PORT: "8765"
+            # Bind on all interfaces: the co-located Python `services` container reaches the
+            # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model
+            # default of `127.0.0.1` (same-host only) is unreachable cross-container.
+            AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Expose runner token env in compose for /run auth.

This service now binds to 0.0.0.0, but AGENTA_AGENT_RUNNER_TOKEN is not plumbed in this environment block. Because this container intentionally has no env_file, operators cannot enable the server’s token gate without editing compose again. Add an explicit token env entry so authorization can be turned on safely.

Suggested patch

environment: PORT: "8765" # Bind on all interfaces: the co-located Python `services` container reaches the # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model # default of `127.0.0.1` (same-host only) is unreachable cross-container. AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0} + AGENTA_AGENT_RUNNER_TOKEN: ${AGENTA_AGENT_RUNNER_TOKEN:-} PI_CODING_AGENT_DIR: /pi-agent

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

environment:

PORT: "8765"

# Bind on all interfaces: the co-located Python `services` container reaches the

# sidecar over the compose network at `sandbox-agent:8765`, so the trust-model

# default of `127.0.0.1` (same-host only) is unreachable cross-container.

AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}

environment:

PORT: "8765"

# Bind on all interfaces: the co-located Python `services` container reaches the

# sidecar over the compose network at `sandbox-agent:8765`, so the trust-model

# default of `127.0.0.1` (same-host only) is unreachable cross-container.

AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}

AGENTA_AGENT_RUNNER_TOKEN: ${AGENTA_AGENT_RUNNER_TOKEN:-}

coderabbitai · 2026-06-25T10:57:31Z

        environment:
            PORT: "8765"
+            # Bind on all interfaces: the co-located Python `services` container reaches the
+            # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model
+            # default of `127.0.0.1` (same-host only) is unreachable cross-container.
+            AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Add token env passthrough for optional /run authorization.

AGENTA_AGENT_RUNNER_HOST is opened to all interfaces, but AGENTA_AGENT_RUNNER_TOKEN is missing from the allowed env list. Since sandbox-agent has no env_file, this blocks enabling the server’s token check via normal env configuration.

Suggested patch

environment: PORT: "8765" # Bind on all interfaces: the co-located Python `services` container reaches the # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model # default of `127.0.0.1` (same-host only) is unreachable cross-container. AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0} + AGENTA_AGENT_RUNNER_TOKEN: ${AGENTA_AGENT_RUNNER_TOKEN:-} PI_CODING_AGENT_DIR: /pi-agent

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

environment:

PORT: "8765"

# Bind on all interfaces: the co-located Python `services` container reaches the

# sidecar over the compose network at `sandbox-agent:8765`, so the trust-model

# default of `127.0.0.1` (same-host only) is unreachable cross-container.

AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}

environment:

PORT: "8765"

# Bind on all interfaces: the co-located Python `services` container reaches the

# sidecar over the compose network at `sandbox-agent:8765`, so the trust-model

# default of `127.0.0.1` (same-host only) is unreachable cross-container.

AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}

AGENTA_AGENT_RUNNER_TOKEN: ${AGENTA_AGENT_RUNNER_TOKEN:-}

coderabbitai · 2026-06-25T10:57:31Z

        environment:
            PORT: "8765"
+            # Bind on all interfaces: the co-located Python `services` container reaches the
+            # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model
+            # default of `127.0.0.1` (same-host only) is unreachable cross-container.
+            AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Keep auth configurable in GH stack by wiring runner token env.

This compose file exposes the runner on 0.0.0.0 but does not pass AGENTA_AGENT_RUNNER_TOKEN. That prevents enabling the /run auth gate without modifying compose again.

Suggested patch

environment: PORT: "8765" # Bind on all interfaces: the co-located Python `services` container reaches the # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model # default of `127.0.0.1` (same-host only) is unreachable cross-container. AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0} + AGENTA_AGENT_RUNNER_TOKEN: ${AGENTA_AGENT_RUNNER_TOKEN:-} AGENTA_HOST: ${AGENTA_HOST:-}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

environment:

PORT: "8765"

# Bind on all interfaces: the co-located Python `services` container reaches the

# sidecar over the compose network at `sandbox-agent:8765`, so the trust-model

# default of `127.0.0.1` (same-host only) is unreachable cross-container.

AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}

environment:

PORT: "8765"

# Bind on all interfaces: the co-located Python `services` container reaches the

# sidecar over the compose network at `sandbox-agent:8765`, so the trust-model

# default of `127.0.0.1` (same-host only) is unreachable cross-container.

AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}

AGENTA_AGENT_RUNNER_TOKEN: ${AGENTA_AGENT_RUNNER_TOKEN:-}

AGENTA_HOST: ${AGENTA_HOST:-}

coderabbitai · 2026-06-25T10:57:31Z

+    agent = params.get("agent")
+    source = agent if isinstance(agent, dict) else params
+    harness = str(source.get("harness") or defaults.harness).lower()
+    raw_uri = source.get("uri")
+    uri = _clean_uri(raw_uri) if raw_uri is not None else defaults.uri


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Let explicit uri: null clear inherited runner overrides.

Line 965 treats explicit None the same as an absent key, so a caller cannot clear a default/stored runner URI except by sending blanks. Since this field controls where secret-bearing runner requests go, honor key presence.

Proposed fix

agent = params.get("agent") source = agent if isinstance(agent, dict) else params harness = str(source.get("harness") or defaults.harness).lower() - raw_uri = source.get("uri") - uri = _clean_uri(raw_uri) if raw_uri is not None else defaults.uri + uri = _clean_uri(source.get("uri")) if "uri" in source else defaults.uri permission_policy = str(

coderabbitai · 2026-06-25T10:57:32Z

+    const message =
+      err instanceof Error ? (err.stack ?? err.message) : String(err);
    result = { ok: false, error: message };


🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

Returning err.stack in the response body leaks internal details.

Switching the error string from err.message to err.stack embeds full stack traces (file paths, internal structure) into the /run result. Per the wire contract, the Python side maps ok:false into RuntimeError(f"Agent run failed: {error}"), so this trace can propagate upward and potentially surface to end users. Prefer logging the stack server-side and returning only err.message on the wire. The same pattern repeats at Lines 193-195.

🛡️ Proposed change

- const message = - err instanceof Error ? (err.stack ?? err.message) : String(err); + if (err instanceof Error && err.stack) { + process.stderr.write(`[sandbox-agent] run error: ${err.stack}\n`); + } + const message = err instanceof Error ? err.message : String(err); result = { ok: false, error: message };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const message =

err instanceof Error ? (err.stack ?? err.message) : String(err);

result = { ok: false, error: message };

if (err instanceof Error && err.stack) {

process.stderr.write(`[sandbox-agent] run error: ${err.stack}\n`);

}

const message = err instanceof Error ? err.message : String(err);

result = { ok: false, error: message };

coderabbitai · 2026-06-25T10:57:32Z

+// Legacy pre-migration run-selection keys that now live inside `agent`. Stripped from the
+// top level when `agent` is present so we never emit both wire shapes for one config.
+const LEGACY_RUN_SELECTION_KEYS = ["harness", "sandbox", "permission_policy"] as const
+
 const configFor = (appId?: string | null) => {
    const resolved = resolveAppAgConfig(appId)
    if (!resolved) return stubConfig()
-    return {
-        parameters: {harness: "pi_core", sandbox: "local", ...resolved.ag_config},
-        references: resolved.references,
+    const agConfig = resolved.ag_config as Record<string, unknown>
+    const agent = agConfig.agent
+    let parameters: Record<string, unknown>
+    if (agent && typeof agent === "object") {
+        const rest = {...agConfig}
+        for (const key of LEGACY_RUN_SELECTION_KEYS) delete rest[key]
+        parameters = {
+            ...rest,
+            agent: {
+                harness: "pi_core",
+                sandbox: "local",
+                ...(agent as Record<string, unknown>),
+            },
+        }
+    } else {
+        parameters = {harness: "pi_core", sandbox: "local", ...agConfig}
    }
+    return {parameters, references: resolved.references}


🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check whether permission_policy is ever set at the top level of resolved ag_config rg -nP --type=ts -C3 'permission_policy' web/oss/src/components/AgentChatSlice

Repository: Agenta-AI/agenta

Length of output: 945

🏁 Script executed:

#!/bin/bash # Find usage of configFor and search for permission_policy in tests or other files rg -NP --type=ts "configFor" web/oss/src/components/AgentChatSlice rg -NP --type=ts "permission_policy" --ignore-case --glob "!(web/oss/src/components/AgentChatSlice/assets/transport.ts)"

Repository: Agenta-AI/agenta

Length of output: 379

🏁 Script executed:

#!/bin/bash # Search for permission_policy across the repository to see how it's defined and used rg -N "permission_policy" --type=ts # Also check transport.ts around the configFor definition to see full scope sed -n '55,90p' 'web/oss/src/components/AgentChatSlice/assets/transport.ts'

Repository: Agenta-AI/agenta

Length of output: 154

🏁 Script executed:

#!/bin/bash cat -n web/oss/src/components/AgentChatSlice/assets/transport.ts | sed -n '55,100p'

Repository: Agenta-AI/agenta

Length of output: 2541

🏁 Script executed:

#!/bin/bash # Search for stubConfig to understand fallback behavior rg -Nb "stubConfig" web/oss/src/components/AgentChatSlice/assets/transport.ts # Search for the definition of LEGACY_RUN_SELECTION_KEYS context to see if permission_policy is expected elsewhere rg -Nb "resolved.ag_config" web/oss/src

Repository: Agenta-AI/agenta

Length of output: 361

🏁 Script executed:

#!/bin/bash # Check the stubConfig definition to see if permission_policy is included there sed -n '2000,2010p' web/oss/src/components/AgentChatSlice/assets/transport.ts # Search for 'permission_policy' in the `resolved` usage or configuration mapping functions rg -Nb "resolveAppAgConfig" web/oss/src/components/AgentChatSlice/assets/transport.ts | head -20

Repository: Agenta-AI/agenta

Length of output: 352

🏁 Script executed:

#!/bin/bash # Check the agConfig file to understand how the config is resolved and if permission_policy is included find web/oss/src/components/AgentChatSlice/assets -name "agConfig*" -type f | xargs cat -n | head -50

Repository: Agenta-AI/agenta

Length of output: 2419

🏁 Script executed:

#!/bin/bash # Read the rest of agConfig.ts to see the resolution logic and checks for permissions sed -n '50,150p' web/oss/src/components/AgentChatSlice/assets/agConfig.ts

Repository: Agenta-AI/agenta

Length of output: 2435

🏁 Script executed:

#!/bin/bash # Search for test files related to transport or agConfig in this directory find web/oss/src/components/AgentChatSlice/assets -name "*.test.ts" -o -name "*.spec.ts" -o -name "__tests__" # Also check the stubConfig function content directly which we saw starts at 2000 sed -n '2000,2020p' web/oss/src/components/AgentChatSlice/assets/transport.ts

Repository: Agenta-AI/agenta

Length of output: 154

🏁 Script executed:

#!/bin/bash # 1. Check if stubConfig sets permission_policy (start reading around the line number found previously) sed -n '2000,2030p' web/oss/src/components/AgentChatSlice/assets/transport.ts # 2. Search for test files for this component find web/oss/src/components/AgentChatSlice -iname "*.test.ts" -o -iname "*.spec.ts" -o -iname "*.test.tsx" | head -10 # 3. Search for explicit permission_policy usage in the whole oss folder to see how it's typically structured rg "permission_policy" web/oss/src --type=ts -n

Repository: Agenta-AI/agenta

Length of output: 302

🏁 Script executed:

#!/bin/bash # The previous sed command started at 2000, but looking at the file size (line count), it might be smaller or larger. # Let's find the exact line number of 'stubConfig' function definition rg -n "^const stubConfig" web/oss/src/components/AgentChatSlice/assets/transport.ts # Also count lines wc -l web/oss/src/components/AgentChatSlice/assets/transport.ts

Repository: Agenta-AI/agenta

Length of output: 247

🏁 Script executed:

#!/bin/bash # Check the stubConfig implementation fully to see if it includes permission_policy sed -n '36,60p' web/oss/src/components/AgentChatSlice/assets/transport.ts # Also, search for the full file to ensure we've seen the whole context, and perhaps look at the usage site to see if it passes the result to a specific API endpoint sed -n '1,50p' web/oss/src/components/AgentChatSlice/assets/transport.ts

Repository: Agenta-AI/agenta

Length of output: 3520

permission_policy is stripped from the top level but never re-seeded under agent.

LEGACY_RUN_SELECTION_KEYS deletes permission_policy from the root of agConfig when an agent object is present. However, unlike harness and sandbox, it is not re-inserted into the agent block. This causes any top-level permission_policy to be silently lost, reverting to the server-side default.

Ensure the value is migrated from the root to agent, or defaulted if missing:

} else { parameters = {harness: "pi_core", sandbox: "local", ...agConfig} }

Proposed fix:

Before deleting keys from rest, check if permission_policy exists in agConfig.

Add it to the agent object construction if it exists, or rely on a default similar to harness/sandbox if appropriate for your server schema.

Ensure permission_policy is present in the final agent object if it was intended.

coderabbitai · 2026-06-25T10:57:32Z

+    const modeOptions = useMemo(
+        () => allowedConnectionModes(capabilities, harness),
+        [capabilities, harness],
+    )



🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Reconcile auth mode when the harness filters it out.

modeOptions is filtered, but connection.mode can remain unsupported after a harness change; then the segmented control has no valid value while the saved ModelRef still carries the old mode.

Proposed fix

[setField, modelId, connection, config.model, capabilities, harness], ) + + useEffect(() => { + if (modeOptions.length === 0 || modeOptions.includes(connection.mode)) return + writeModel({mode: modeOptions[0], slug: null}) + }, [modeOptions, connection.mode, writeModel]) // On harness switch, clear a model the new harness can't reach (rather than sending an

Also applies to: 440-452

coderabbitai · 2026-06-25T10:57:32Z

+            const nextModelId = patch.modelId !== undefined ? patch.modelId : modelId
+            // When the model changes, derive the provider from the picked model; otherwise keep it.
+            let nextProvider: string | null
+            if (patch.provider !== undefined) {
+                nextProvider = patch.provider
+            } else if (patch.modelId !== undefined) {
+                nextProvider =
+                    providerForModel(capabilities, harness, nextModelId) ?? connection.provider
+            } else {
+                nextProvider = connection.provider
+            }
            setField(
                "model",
                composeModelValue({
-                    modelId: patch.modelId !== undefined ? patch.modelId : modelId,
-                    provider: patch.provider !== undefined ? patch.provider : connection.provider,
+                    modelId: nextModelId,
+                    provider: nextProvider,
                    mode: patch.mode !== undefined ? patch.mode : connection.mode,
                    slug: patch.slug !== undefined ? patch.slug : connection.slug,
                    // Carry through extra ModelRef keys (params, ...) the form does not edit.
                    existing: config.model,
                }),


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Clear the custom connection slug when the provider changes.

writeModel can switch provider from the selected model while keeping the previous provider’s connection.slug, producing mismatched ModelRefs such as Anthropic + an OpenAI custom slug.

Proposed fix

} else { nextProvider = connection.provider } + const providerChanged = nextProvider !== connection.provider + const nextSlug = + patch.slug !== undefined ? patch.slug : providerChanged ? null : connection.slug setField( "model", composeModelValue({ modelId: nextModelId, provider: nextProvider, mode: patch.mode !== undefined ? patch.mode : connection.mode, - slug: patch.slug !== undefined ? patch.slug : connection.slug, + slug: nextSlug, // Carry through extra ModelRef keys (params, ...) the form does not edit. existing: config.model, }),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const nextModelId = patch.modelId !== undefined ? patch.modelId : modelId

// When the model changes, derive the provider from the picked model; otherwise keep it.

let nextProvider: string | null

if (patch.provider !== undefined) {

nextProvider = patch.provider

} else if (patch.modelId !== undefined) {

nextProvider =

providerForModel(capabilities, harness, nextModelId) ?? connection.provider

} else {

nextProvider = connection.provider

}

setField(

"model",

composeModelValue({

modelId: patch.modelId !== undefined ? patch.modelId : modelId,

provider: patch.provider !== undefined ? patch.provider : connection.provider,

modelId: nextModelId,

provider: nextProvider,

mode: patch.mode !== undefined ? patch.mode : connection.mode,

slug: patch.slug !== undefined ? patch.slug : connection.slug,

// Carry through extra ModelRef keys (params, ...) the form does not edit.

existing: config.model,

}),

const nextModelId = patch.modelId !== undefined ? patch.modelId : modelId

// When the model changes, derive the provider from the picked model; otherwise keep it.

let nextProvider: string | null

if (patch.provider !== undefined) {

nextProvider = patch.provider

} else if (patch.modelId !== undefined) {

nextProvider =

providerForModel(capabilities, harness, nextModelId) ?? connection.provider

} else {

nextProvider = connection.provider

}

const providerChanged = nextProvider !== connection.provider

const nextSlug =

patch.slug !== undefined ? patch.slug : providerChanged ? null : connection.slug

setField(

"model",

composeModelValue({

modelId: nextModelId,

provider: nextProvider,

mode: patch.mode !== undefined ? patch.mode : connection.mode,

slug: nextSlug,

// Carry through extra ModelRef keys (params, ...) the form does not edit.

existing: config.model,

}),

coderabbitai · 2026-06-25T10:57:32Z

+    // On harness switch, clear a model the new harness can't reach (rather than sending an
+    // unsupported model). Permissive when the new harness publishes no models.
+    useEffect(() => {
+        if (!harness || !modelId) return
+        if (!harnessAllowsModel(capabilities, harness, modelId)) {
+            writeModel({modelId: null, provider: null})
+        }
+        // Only react to harness/capabilities changes, not every model edit.
+    }, [harness, capabilities])


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Backfill provider once inspect capabilities arrive.

If the fallback picker is used before /inspect resolves, it writes {model: id} with no provider. When capabilities later load and the model is valid, this effect leaves provider null even though the UI contract says selecting a model sets both.

Proposed fix

useEffect(() => { if (!harness || !modelId) return + const inferredProvider = providerForModel(capabilities, harness, modelId) if (!harnessAllowsModel(capabilities, harness, modelId)) { - writeModel({modelId: null, provider: null}) + writeModel({modelId: null, provider: null, slug: null}) + return + } + if (inferredProvider && inferredProvider !== connection.provider) { + writeModel({provider: inferredProvider, slug: null}) } - // Only react to harness/capabilities changes, not every model edit. - }, [harness, capabilities]) + }, [harness, capabilities, modelId, connection.provider, writeModel])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// On harness switch, clear a model the new harness can't reach (rather than sending an

// unsupported model). Permissive when the new harness publishes no models.

useEffect(() => {

if (!harness || !modelId) return

if (!harnessAllowsModel(capabilities, harness, modelId)) {

writeModel({modelId: null, provider: null})

}

// Only react to harness/capabilities changes, not every model edit.

}, [harness, capabilities])

// On harness switch, clear a model the new harness can't reach (rather than sending an

// unsupported model). Permissive when the new harness publishes no models.

useEffect(() => {

if (!harness || !modelId) return

const inferredProvider = providerForModel(capabilities, harness, modelId)

if (!harnessAllowsModel(capabilities, harness, modelId)) {

writeModel({modelId: null, provider: null, slug: null})

return

}

if (inferredProvider && inferredProvider !== connection.provider) {

writeModel({provider: inferredProvider, slug: null})

}

// Only react to harness/capabilities changes, not every model edit.

}, [harness, capabilities, modelId, connection.provider, writeModel])

…view Address the 6 author inline comments on the proposal: - Scope near-term work to steps 1-2 only (network isolation + optional /run token); defer mTLS, short-lived scoped tokens, and payload encryption to an explicit Later/deferred subsection. - Document the decided not-implemented error behavior: the local sandbox errors when a network policy is set, and any specified filesystem policy errors, both mirroring the existing code-tool unsupported gate (code.ts CODE_TOOL_UNSUPPORTED_MESSAGE) and the typed connection *Unsupported*Errors. - Document the decision to disable stdio MCP-server execution in the sidecar the same way code execution was removed (not-implemented gate + remove mcp-bridge.ts/mcp-server.ts/stdio plumbing), and clarify that gateway tools are Layer-3 tool-permission, not Layer-2 sandbox_permission, so they need no change. - Update the legacy in-process pi engine to removed/historical (engines/pi.ts is gone). Verified code state: code execution already removed, stdio MCP still present, pi.ts removed, protocol.ts network comment already corrected by A3. Docs-only; no runner code changed. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…aders Flip the deferred remote-MCP path on in the runner while keeping stdio MCP disabled (PR #4831 disabled the runner-host stdio bridge for security). - toAcpMcpServers now delivers transport:"http" (+url): builds the ACP McpServer type:"http" variant ({name,url,headers}) and routes each resolved env entry into an HTTP request header. The resolved secret already rides the /run wire under env (the SDK resolver merges named secrets into env for both transports), so there is no SDK, protocol, or golden-fixture change. - transport:"stdio" still throws MCP_UNSUPPORTED_MESSAGE; run-plan still refuses any run carrying a stdio MCP server. A command-less stdio or url-less http server is skipped (logged), unchanged. - Tests: http delivered with the secret in a header, empty header list with no secret, stdio still rejected, mixed input skips non-deliverable entries. - Docs synced: runner-to-mcp-server + mcp-models-and-resolution inventory, public-edge agent-config-schema, the sidecar enforcement matrix (http enabled row), http-mcp-transport status. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…ions The runner's capability handling degraded silently: a non-Pi run carrying tools whose harness probe reports mcpTools:false / toolCalls:false had its tools dropped without a trace, and a failed probe fell back to a static guess with no signal that it had. A wrong or missing flag changed behavior quietly instead of erroring. Make capability handling fail loud, mirroring the *_UNSUPPORTED_MESSAGE gates from #4831 and tools/code.ts CODE_TOOL_UNSUPPORTED_MESSAGE: - assertRequiredCapabilities (capabilities.ts) refuses a non-Pi run that carries tool specs when the probe reports it cannot receive tools, with a specific error naming the missing flags and the dropped tool count. Pi is exempt (its tools ride the native extension, not MCP); a no-tools run is unaffected. - probeCapabilities now returns ProbedCapabilities { capabilities, source } so callers know whether the flags were probed or a static guess; the gate logs the source so a failed probe is debuggable. - Debug assertions across the runner hot path (run-plan build, capability probe shape, capability gate, sandbox start) catch impossible states early with actionable [sandbox-agent invariant] messages. Docs: runner-to-harness inventory page updated to record the fail-loud gate and drop the stale 'a wrong flag silently changes behavior' claim. Tests: 182 vitest (was 168) + typecheck green in services/agent. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

Add a research project documenting the service->runner /run trust model and the verified sandbox enforcement state. Part 1 records that /run carries plaintext provider secrets plus the toolCallback and trace bearer tokens over unauthenticated plain HTTP (deliver_http in ts_runner.py, no auth check in server.ts), safe only on a trusted network. Proposes localhost/in-cluster-only + an optional shared /run token as the pragmatic default, with TLS/mTLS and short-lived scoped tokens as the hardening path. Part 2 corrects the record: SandboxPermission network egress IS enforced on Daytona (daytonaNetworkFields/buildSandboxProvider), not enforceable on the local sandbox (run-plan.ts fails loud under strict), filesystem is declared-only, and code/gateway tools plus stdio MCP run on the runner host outside the sandbox boundary. Includes an enforcement matrix and the exact protocol.ts:149-150 comment correction for the protocol.ts owner to apply. Docs only; no code changed. Composio, the tool gateway, connections, and MCP are unchanged. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…view Address the 6 author inline comments on the proposal: - Scope near-term work to steps 1-2 only (network isolation + optional /run token); defer mTLS, short-lived scoped tokens, and payload encryption to an explicit Later/deferred subsection. - Document the decided not-implemented error behavior: the local sandbox errors when a network policy is set, and any specified filesystem policy errors, both mirroring the existing code-tool unsupported gate (code.ts CODE_TOOL_UNSUPPORTED_MESSAGE) and the typed connection *Unsupported*Errors. - Document the decision to disable stdio MCP-server execution in the sidecar the same way code execution was removed (not-implemented gate + remove mcp-bridge.ts/mcp-server.ts/stdio plumbing), and clarify that gateway tools are Layer-3 tool-permission, not Layer-2 sandbox_permission, so they need no change. - Update the legacy in-process pi engine to removed/historical (engines/pi.ts is gone). Verified code state: code execution already removed, stdio MCP still present, pi.ts removed, protocol.ts network comment already corrected by A3. Docs-only; no runner code changed. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…undaries Implements the decided near-term scope of the sidecar-trust/sandbox-enforcement proposal (runner-only, services/agent/src): 1. Network isolation: server.ts binds to AGENTA_AGENT_RUNNER_HOST (default 127.0.0.1, never 0.0.0.0) and gates POST /run with an OPTIONAL shared token (AGENTA_AGENT_RUNNER_TOKEN, default OFF; constant-time compare; /health open). 2. Error-on-unimplemented sandbox_permission in run-plan.ts: a restricted network policy on the LOCAL sandbox and any filesystem value now fail loudly the not-implemented way (LOCAL_NETWORK_UNSUPPORTED_MESSAGE / FILESYSTEM_UNSUPPORTED_MESSAGE), regardless of enforcement. 3. stdio MCP disabled the way code execution was: MCP_UNSUPPORTED_MESSAGE in tools/mcp-bridge.ts; buildToolMcpServers + toAcpMcpServers throw, mcp-server.ts reduced to a refusing stub, and run-plan.ts rejects any run carrying a stdio MCP server. Removes custom-tool delivery to non-Pi harnesses + user stdio MCP. protocol.ts is untouched (the gates are runtime behavior, not new wire fields). Gateway/callback tools are unchanged (Layer-3 tool-permission). Docs synced. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

The sidecar trust model defaults the runner bind to 127.0.0.1 (same-host only). In docker-compose the Python services container and the sandbox-agent sidecar are SEPARATE containers, so 127.0.0.1 makes the sidecar unreachable: every agent run failed with 'All connection attempts failed' and the playground chat showed a stream error. The sidecar's own healthcheck hits 127.0.0.1 inside its container, so it passed and masked the break. Set AGENTA_AGENT_RUNNER_HOST=0.0.0.0 (overridable) on the sandbox-agent service in the EE dev, OSS dev, and OSS gh compose files so the co-located services container can reach it at sandbox-agent:8765. Verified live: the agent chat streams end to end again. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…aders Flip the deferred remote-MCP path on in the runner while keeping stdio MCP disabled (PR #4831 disabled the runner-host stdio bridge for security). - toAcpMcpServers now delivers transport:"http" (+url): builds the ACP McpServer type:"http" variant ({name,url,headers}) and routes each resolved env entry into an HTTP request header. The resolved secret already rides the /run wire under env (the SDK resolver merges named secrets into env for both transports), so there is no SDK, protocol, or golden-fixture change. - transport:"stdio" still throws MCP_UNSUPPORTED_MESSAGE; run-plan still refuses any run carrying a stdio MCP server. A command-less stdio or url-less http server is skipped (logged), unchanged. - Tests: http delivered with the secret in a header, empty header list with no secret, stdio still rejected, mixed input skips non-deliverable entries. - Docs synced: runner-to-mcp-server + mcp-models-and-resolution inventory, public-edge agent-config-schema, the sidecar enforcement matrix (http enabled row), http-mcp-transport status. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…ions The runner's capability handling degraded silently: a non-Pi run carrying tools whose harness probe reports mcpTools:false / toolCalls:false had its tools dropped without a trace, and a failed probe fell back to a static guess with no signal that it had. A wrong or missing flag changed behavior quietly instead of erroring. Make capability handling fail loud, mirroring the *_UNSUPPORTED_MESSAGE gates from #4831 and tools/code.ts CODE_TOOL_UNSUPPORTED_MESSAGE: - assertRequiredCapabilities (capabilities.ts) refuses a non-Pi run that carries tool specs when the probe reports it cannot receive tools, with a specific error naming the missing flags and the dropped tool count. Pi is exempt (its tools ride the native extension, not MCP); a no-tools run is unaffected. - probeCapabilities now returns ProbedCapabilities { capabilities, source } so callers know whether the flags were probed or a static guess; the gate logs the source so a failed probe is debuggable. - Debug assertions across the runner hot path (run-plan build, capability probe shape, capability gate, sandbox start) catch impossible states early with actionable [sandbox-agent invariant] messages. Docs: runner-to-harness inventory page updated to record the fail-loud gate and drop the stale 'a wrong flag silently changes behavior' claim. Tests: 182 vitest (was 168) + typecheck green in services/agent. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

… stdio MCP relay shim QA finding F-042: an Agenta gateway/callback tool is never surfaced to the Claude harness when it runs in a Daytona sandbox. The same tool works on Claude+LOCAL and Pi+Daytona; only Claude x Daytona drops it. Root cause is an advertisement gap, not an execution gap. On LOCAL, the runner advertises gateway tools to Claude over a loopback HTTP MCP server. On Daytona that loopback is correctly skipped (the in-sandbox harness's 127.0.0.1 is the sandbox's, not the runner's), and Pi delivers via its bundled in-sandbox extension + the file relay. Claude on Daytona had NO in-sandbox advertiser, so sessionInit.mcpServers was empty and the model never learned the tool existed. The file relay loop (execution) was already running, idle. Fix: a tiny self-contained stdio MCP server, esbuild-bundled like the Pi extension, uploaded into the sandbox and advertised to Claude as an ACP McpServerStdio entry (typeless {name,command,args,env}). The sandbox-agent SDK forwards it verbatim into the in-sandbox newSession; the Claude ACP adapter maps a typeless entry to a Claude SDK {type:"stdio",...} server, launched IN the sandbox. On tools/call the shim writes the same relay request the existing runner-side relay loop already executes. The entire execution path, security model, and local path are unchanged. - src/tools/relay-mcp-stdio.ts: the shim (newline-delimited MCP JSON-RPC over stdio). - src/tools/relay-client.ts: the file-relay client factored out of dispatch.ts so the bundle carries only file-relay code (no callback/code executor) — 5.3 kB, 0 network. - src/engines/sandbox_agent/relay-shim.ts: bundle path + fail-loud upload helper. - src/engines/sandbox_agent/mcp.ts: Daytona non-Pi branch builds the stdio entry. - src/engines/sandbox_agent.ts: upload the shim + pass the path on Daytona non-Pi+tools. - scripts/build-extension.mjs: also bundle dist/tools/relay-mcp-stdio.js. Security: the stdio child runs INSIDE the Daytona sandbox (not the runner host, so it does not reopen the #4831 runner-host-stdio hole), carries only public specs + the relay dir (no secret), and every credentialed action stays server-side via the relay. Layer-3 permission enforcement in startToolRelay is unchanged. Codex-reviewed (xhigh). DRAFT: not live-verified on Daytona yet (the one remaining gate). 284 runner unit tests + typecheck green; bundle builds and answers initialize over stdio. Design: docs/design/agent-workflows/projects/claude-daytona-gateway/ Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

vercel Bot deployed to Preview June 24, 2026 19:25 View deployment

mmabrouk commented Jun 24, 2026

View reviewed changes

mmabrouk force-pushed the docs/sidecar-trust-and-sandbox-enforcement branch from 10acf57 to adcd126 Compare June 24, 2026 19:52

vercel Bot deployed to Preview June 24, 2026 19:53 View deployment

mmabrouk mentioned this pull request Jun 24, 2026

feat(agent): replace sandbox with an optional sidecar uri (allowlist-gated routing) #4836

Closed

mmabrouk force-pushed the docs/sidecar-trust-and-sandbox-enforcement branch from adcd126 to 227de27 Compare June 24, 2026 21:21

mmabrouk changed the title ~~docs(agent-workflows): proposal — sidecar trust + sandbox enforcement~~ feat(agent): enforce sidecar trust + disable unenforceable sandbox boundaries Jun 24, 2026

mmabrouk changed the base branch from big-agents to feat/agent-contract-versioning-docs June 24, 2026 21:22

vercel Bot deployed to Preview June 24, 2026 21:22 View deployment

mmabrouk marked this pull request as ready for review June 24, 2026 21:22

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Backend Feature Request New feature or request labels Jun 24, 2026

mmabrouk force-pushed the docs/sidecar-trust-and-sandbox-enforcement branch from 227de27 to 8bbcd82 Compare June 24, 2026 21:36

vercel Bot deployed to Preview June 24, 2026 21:38 View deployment

mmabrouk force-pushed the docs/sidecar-trust-and-sandbox-enforcement branch from cdeb18a to f0225ca Compare June 25, 2026 10:42

vercel Bot deployed to Preview June 25, 2026 10:43 View deployment

mmabrouk mentioned this pull request Jun 25, 2026

feat(frontend): harness-aware agent provider + model picker (inspect models) #4839

Merged

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

mmabrouk force-pushed the docs/sidecar-trust-and-sandbox-enforcement branch from f0225ca to f7baee1 Compare June 25, 2026 11:36

vercel Bot deployed to Preview June 25, 2026 11:38 View deployment

mmabrouk mentioned this pull request Jun 25, 2026

docs(agent): plan — restore gateway-tool MCP for Claude Code (decision needed) #4844

Merged

mmabrouk added 4 commits June 25, 2026 20:20

mmabrouk changed the base branch from feat/agent-contract-versioning-docs to big-agents June 25, 2026 18:27

mmabrouk force-pushed the docs/sidecar-trust-and-sandbox-enforcement branch from f7baee1 to 8be0209 Compare June 25, 2026 18:28

vercel Bot deployed to Preview June 25, 2026 18:29 View deployment

mmabrouk mentioned this pull request Jun 25, 2026

feat(agent): restore the internal gateway-tool MCP channel for Claude #4847

Merged

mmabrouk merged commit 419e4fb into big-agents Jun 25, 2026
30 checks passed

mmabrouk mentioned this pull request Jun 26, 2026

fix(agent): deliver Claude gateway tools on Daytona via an in-sandbox stdio MCP relay shim (F-042) #4873

Open


		Ordered roughly cheapest-to-strongest. They compose; this is not either/or.

		1. Network isolation / localhost binding (make the implicit assumption real).

Uh oh!

Conversation

mmabrouk commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Scope / risk

Tests / notes

How to QA

Uh oh!

vercel Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmabrouk commented Jun 24, 2026

Uh oh!

mmabrouk commented Jun 24, 2026

Uh oh!

mmabrouk commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

mmabrouk commented Jun 25, 2026

Added: bind the agent sidecar on 0.0.0.0 in compose (stream-error fix)

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

mmabrouk commented Jun 24, 2026 •

edited

Loading

vercel Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading