diff --git a/docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/README.md b/docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/README.md
new file mode 100644
index 0000000000..1727a36f67
--- /dev/null
+++ b/docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/README.md
@@ -0,0 +1,433 @@
+# Sidecar Trust and Sandbox Enforcement
+
+Research and proposal workspace. It answers two questions, corrects the record on enforcement,
+and records the decisions taken on author review. The decided near-term code changes have since
+**landed in a separate runner-only PR** (lane `feat/agent-sidecar-trust-enforcement`,
+`services/agent/src/**`): loopback binding + optional `/run` token, error-on-unimplemented for
+local `network` / any `filesystem`, and the stdio MCP disable. The inline "NOW IMPLEMENTED"
+notes mark what shipped; `protocol.ts` was not edited (A3-owned; the gates are runtime behavior,
+not new wire fields). See [`status.md`](./status.md) for the landed summary.
+
+- **Part 1 — sidecar trust and transport.** The Python agent service calls the Node runner
+  sidecar at `/run` over plain HTTP with no authentication, carrying plaintext provider
+  secrets and two bearer tokens. Today that is safe only because the sidecar is assumed to
+  sit on a trusted local or in-cluster network. What if it is not? This part proposes how to
+  protect the boundary. **Decided near-term scope: implement only step 1 (network
+  isolation / loopback-or-in-cluster binding) and step 2 (optional shared `/run` token).**
+  mTLS, short-lived scoped tokens, and payload encryption are explicitly deferred.
+- **Part 2 — sandbox enforcement reality.** The `SandboxPermission` boundary is *partly*
+  enforced, not "declared, not enforced" as several docs once said. This part records the real
+  state in an enforcement matrix and the review decisions on the not-implemented axes, now
+  implemented: the local sandbox **errors** when a `network` policy is set, `filesystem`
+  **errors** when specified, `code` execution is already removed, stdio MCP is now **disabled**
+  the same way, gateway tools need no change (Layer-3 tool-permission, not the sandbox boundary),
+  and the legacy in-process `pi` engine is removed. The original stale `protocol.ts` comment was
+  already corrected by the sibling code agent that owns that file.
+
+Composio, the tool gateway (gateway/callback tools), and named connections are referenced only
+as things that already exist; this work changes none of them. **The one exception is MCP:** the
+stdio MCP-server implementation in the sidecar is now disabled (parity with the removed code
+execution) until its security issues are fixed.
+
+## Files
+
+- `README.md` — this file (Part 1 proposal + decided scope, Part 2 enforcement matrix +
+  review decisions, the protocol.ts comment status).
+- `status.md` — state, decisions, open questions, and the coordination flag.
+
+---
+
+## Part 1 — Sidecar trust and transport
+
+### How the service reaches the runner today
+
+The Python agent service (`services/oss/src/agent/`) is the control plane; the Node runner
+sidecar (`services/agent/`) is the execution plane. The boundary is the `/run` contract.
+
+- **Transport selection** is in `services/oss/src/agent/config.py` and
+  `services/oss/src/agent/app.py`. `runner_url()` reads `AGENTA_AGENT_RUNNER_URL`:
+  - set → HTTP transport to the deployed sidecar (`SandboxAgentBackend(url=...)`).
+  - unset → local development spawns the runner as a CLI **subprocess** from `runner_dir()`.
+- **HTTP delivery** is `deliver_http` / `deliver_http_stream` in
+  `sdks/python/agenta/sdk/agents/utils/ts_runner.py`. It does
+  `client.post(url, json=payload)` with **no `Authorization` header, no client cert, no
+  shared secret**. The only header it ever sets is `Accept: application/x-ndjson` for the
+  streaming path.
+- **The sidecar accepts anything.** `services/agent/src/server.ts`
+  (`createRequestListener`) serves `POST /run` and `GET /health`. It reads the body and the
+  `accept` header. **It performs no authentication or authorization check.** Any client that
+  can open a TCP connection to the port can submit a run.
+
+### What crosses the boundary in plaintext
+
+Every `/run` payload (built by `request_to_wire` in
+`sdks/python/agenta/sdk/agents/utils/wire.py`) can carry:
+
+- **`secrets`** — a flat dict of provider env vars (e.g. `OPENAI_API_KEY`,
+  `ANTHROPIC_API_KEY`, and the full AWS/GCP/Azure credential groups). This is the only
+  vault-key channel on the wire. These are real, resolved provider credentials.
+- **`toolCallback.authorization`** — a bearer token the runner uses to call the gateway tool
+  endpoint back on the platform (`callAgentaTool` in `services/agent/src/tools/relay.ts`).
+- **`trace.authorization`** — the full `Authorization` header value (the caller's user
+  bearer token), copied from the inbound request in `services/oss/src/agent/tracing.py` and
+  used to export OTel spans back to the platform.
+
+So a single `/run` body can contain provider API keys plus two reusable bearer tokens. On an
+untrusted network, an interceptor or a rogue peer reading the body gets all of it; a peer
+that can merely *reach* the port can also submit arbitrary runs (prompt injection, tool
+abuse, resource exhaustion) against whatever credentials are configured.
+
+There is no transport-security statement anywhere in the codebase or the interface inventory
+— no mTLS, no localhost-only note, no shared-token check. The trust model is implicit:
+**"the sidecar is only reachable on a trusted network."** This part makes that assumption
+explicit and proposes what to do if it cannot hold.
+
+### Threat model (only relevant if the sidecar is reachable from an untrusted network)
+
+| Threat | What it gets | Today's defense |
+| --- | --- | --- |
+| Eavesdrop the `/run` body | Provider keys + both bearer tokens | None (plain HTTP) |
+| Unauthenticated `/run` submission | Run arbitrary turns/tools on configured creds | None (`/run` is open) |
+| Replay a captured `trace.authorization` / `toolCallback.authorization` | Impersonate the user against the platform until the token expires | Token lifetime only |
+| Tamper with the in-flight payload | Swap model, tools, secrets, sandbox policy | None (no integrity check) |
+| SSRF / port scan from a co-tenant | Reach `/run` and `/health` | None |
+
+The boundary is a **one-way trust** today: the service trusts the sidecar to run faithfully,
+and the sidecar trusts every caller to be the service.
+
+### Options
+
+Ordered roughly cheapest-to-strongest. They compose; this is not either/or.
+
+1. **Network isolation / localhost binding (make the implicit assumption real).**
+   Keep `/run` unauthenticated but guarantee the network it lives on.
+   - Bind the sidecar to `127.0.0.1` (or a private pod/sidecar interface), never `0.0.0.0`,
+     when service and runner are co-located (same host, same pod, Compose internal network).
+   - In Kubernetes: a sidecar container in the same pod (loopback only) or a `ClusterIP`
+     service plus a `NetworkPolicy` that allows ingress *only* from the agent-service pods.
+   - In Compose: an internal network with no published port; the runner port is never mapped
+     to the host.
+   - **Cost:** near zero, config only. **Limit:** no defense in depth — anything that lands
+     on the trusted network (a compromised co-tenant, a misconfigured ingress, the
+     subprocess transport on a shared dev box) sees everything. This is the *floor*, the
+     thing we should assert and document today; it is not sufficient on its own once the
+     boundary can be untrusted.
+
+2. **A shared auth token on `/run`.**
+   The service sends a secret bearer (e.g. `X-Agenta-Runner-Token` or `Authorization`); the
+   sidecar rejects requests that do not match. There is already an in-repo precedent for
+   exactly this shape: the provider-model-auth work gated the platform's vault-resolve route
+   with `X-Agenta-Internal-Token` / `AGENTA_VAULT_RESOLVE_INTERNAL_TOKEN`. Mirror it on
+   `/run`.
+   - **Cost:** small — one env var on each side, one header set in `ts_runner.py`, one
+     constant-time compare in `server.ts`. **Wins:** stops unauthenticated submission.
+     **Limit:** a static shared secret is still plaintext over plain HTTP (eavesdroppable);
+     it authenticates the *caller* but does not protect the *payload* in transit. Pair it
+     with TLS.
+
+3. **Transport encryption (TLS), and mTLS for mutual auth.**
+   - **TLS** (server cert on the sidecar, HTTPS from the service) closes eavesdropping and
+     tampering of the payload — the load-bearing fix for secrets and tokens in transit.
+   - **mTLS** (client cert on the service too) additionally authenticates the caller without
+     a shared secret, and is the cleanest fit for a service mesh (Istio/Linkerd can provide
+     it transparently with zero app code).
+   - **Cost:** TLS needs cert provisioning/rotation; mTLS needs a CA or a mesh. Higher
+     operational weight, especially for self-hosters. **Wins:** confidentiality + integrity +
+     (mTLS) mutual identity. This is the strong end of the hardening path.
+
+4. **Short-lived, scoped tokens (reduce blast radius of what crosses).**
+   The two bearer tokens are the most reusable loot. Independently of transport:
+   - Make `trace.authorization` and `toolCallback.authorization` **short-lived** and
+     **audience/scope-restricted** (trace-export-only; gateway-call-only) so a leaked token
+     buys a small, bounded capability for a short window instead of full user impersonation.
+   - This is the most valuable *payload-level* hardening and is orthogonal to TLS. It also
+     improves the picture even on a trusted network. (It touches token issuance, which is
+     platform-side and out of scope for this doc to design; recorded as the highest-value
+     follow-up.)
+
+5. **Payload encryption (envelope-encrypt `secrets`).**
+   Encrypt the `secrets` dict (and optionally the tokens) at the service and decrypt in the
+   sidecar with a shared key, so the keys are not plaintext even if TLS is terminated early
+   or logged. **Cost:** key management + a wire-shape change (the golden contract). **Verdict:**
+   generally redundant once TLS is in place and usually not worth the wire churn; keep it in
+   reserve for a "secrets must never be plaintext at the HTTP layer even under TLS
+   termination" requirement, not as a default.
+
+### Recommendation
+
+**Decided near-term scope (implement now): steps 1 and 2 only.** The author confirmed on
+review: "I agree let's only implement 1 and 2 now." Both are config-only, ship today, and
+make the implicit trust assumption real and auditable. Everything heavier (mTLS, short-lived
+scoped tokens, payload encryption) is explicitly **deferred** to the [Later / deferred
+hardening](#later--deferred-hardening) subsection below — it is NOT part of the near-term work.
+
+1. **State the trust model explicitly and enforce localhost/in-cluster-only by default.**
+   Bind the sidecar to a loopback or private interface; never publish its port. Document, in
+   the interface inventory and the deployment proposal, that `/run` carries plaintext
+   secrets and bearer tokens and therefore **must** sit on a trusted, non-public network.
+   This is the immediate, zero-cost correctness fix for the "no transport-security
+   statement anywhere" gap. (The author endorsed the network-isolation/loopback-binding
+   option directly: "agree.") **IMPLEMENTED:** `server.ts` binds to `AGENTA_AGENT_RUNNER_HOST`
+   (default `127.0.0.1`, never `0.0.0.0`); set it to the private pod/internal interface in
+   Kubernetes/Compose and never map the port to the host.
+2. **Add a shared auth token on `/run`**, reusing the existing `X-Agenta-Internal-Token`
+   precedent (a new `AGENTA_AGENT_RUNNER_TOKEN`), default-off so co-located/loopback
+   deployments are unaffected, on when set. Cheap defense-in-depth against accidental
+   exposure. **IMPLEMENTED:** when `AGENTA_AGENT_RUNNER_TOKEN` is set, `server.ts` requires it
+   on `/run` (`Authorization: Bearer <token>` or `X-Agenta-Runner-Token: <token>`,
+   constant-time compare) and returns 401 otherwise; `/health` stays open for liveness probes.
+
+Rationale: steps 1–2 are config-only, make the implicit assumption real and auditable, and
+need no wire-contract change or cert/key machinery. They are the whole near-term ask.
+
+#### Later / deferred hardening
+
+**Not part of the near-term work.** These are the real security upgrades for when the boundary
+can be untrusted (cross-host, multi-tenant, mesh). They each carry design/ops weight, so they
+belong on a deliberate hardening path on a separate timeline, not in this cycle.
+
+3. **TLS on the `/run` transport** (and the step-2 token then rides an encrypted channel) —
+   the load-bearing fix for secrets/tokens in transit. Prefer **mTLS via a service mesh**
+   where one is available, since it gives mutual auth with no app code and handles rotation.
+   *Deferred.*
+4. **Short-lived, audience-scoped `trace.authorization` and `toolCallback.authorization`** —
+   the highest-value payload-level reduction in blast radius; pursue independently of
+   transport. Touches platform-side token issuance. *Deferred.*
+5. **Payload encryption of `secrets`** — held in reserve; only if a requirement forbids
+   plaintext secrets even under TLS. Rarely worth its wire-contract cost. *Deferred.*
+
+Note on the CLI/subprocess transport: when `AGENTA_AGENT_RUNNER_URL` is unset, the runner is
+a child process and the payload never touches a network socket (stdin pipe). That path needs
+none of this; the trust question is purely about the HTTP transport.
+
+---
+
+## Part 2 — Sandbox enforcement reality (correcting the record)
+
+The headline correction: **`SandboxPermission` network egress IS enforced — on Daytona.** The
+widely repeated "declared, not yet enforced" line is now wrong for the network axis. The
+filesystem axis is still declared-only, and several execution paths bypass the boundary even
+where it is enforced. The matrix below is the verified state, followed by the decisions taken
+on review.
+
+Review decisions folded into this part (each captured inline below): on the **local** sandbox a
+set `network` policy now **errors** (not enforceable there), `filesystem` **errors when
+specified** (not implemented anywhere), `code` execution is **already removed**, stdio MCP is
+**slated to be disabled the same way** (and its sidecar implementation removed), gateway tools
+need **no change** (they are Layer-3 tool-permission, not Layer-2 `sandbox_permission`), and the
+legacy in-process `pi` engine is **removed**.
+
+### What the code actually does
+
+- **Daytona network egress — ENFORCED.**
+  `services/agent/src/engines/sandbox_agent/provider.ts`, `daytonaNetworkFields()`, maps the
+  Layer-2 network policy onto Daytona sandbox-create fields:
+  - `mode: "off"` → `{ networkBlockAll: true }`
+  - `mode: "allowlist"` with a **non-empty** list → `{ networkAllowList: "<cidr,cidr>" }`
+    (a comma-separated CIDR string)
+  - `mode: "allowlist"` with an **empty** list → `{ networkBlockAll: true }` (empty allowlist
+    means "allow nothing", deliberately block-all rather than default-open)
+  - `mode: "on"` or no policy → `{}` (sandbox stays default-open)
+  These fields are spread into the Daytona `create` call in `buildSandboxProvider()`, so the
+  egress boundary is applied by the Daytona platform at sandbox creation. This is real
+  enforcement at the sandbox boundary.
+
+- **Local sandbox network egress — NOT enforceable.**
+  The local sidecar runs on the runner host, which has no per-run egress control. So a
+  restricted-network policy cannot be honored locally.
+  `services/agent/src/engines/sandbox_agent/run-plan.ts` (`buildRunPlan`) **rejects** a
+  restricted-network run on the local sandbox when `enforcement: "strict"` (fail loud), and
+  proceeds with a warning under `best_effort`.
+
+  **Decided behavior (review): error when `network` is set on local, the way we error on a
+  not-implemented capability — regardless of `enforcement`.** The author's call: "it might
+  make sense to return an error the way we do when a capability is not implemented if that
+  variable is set for local." So a restricted `sandbox_permission.network` (`mode` other than
+  `on`) on the local sandbox should **fail with a not-implemented-style error**, not be
+  silently accepted under `best_effort`. The local sandbox genuinely cannot enforce egress, so
+  declaring the boundary and then ignoring it is the trap we close.
+
+  *Pattern to mirror.* The codebase already errors this way for an unsupported feature: the
+  code-tool gate in `services/agent/src/tools/code.ts` keeps the interface but throws a single
+  named-constant message (`CODE_TOOL_UNSUPPORTED_MESSAGE = "Code tools are not supported by the
+  sidecar."`) from `runCodeTool`, and the Python connection layer raises typed
+  `*Unsupported*Error`s (e.g. `UnsupportedDeploymentError` in
+  `sdks/python/agenta/sdk/agents/connections/errors.py`: `"deployment '<x>' is not supported …"`)
+  for capabilities that are declared but not implemented. The local-network rejection should
+  surface the same way: a clear "network policy is not enforceable on the local sandbox"
+  error at plan time in `buildRunPlan`, so the run fails loudly rather than running unconfined.
+  **NOW IMPLEMENTED:** `buildRunPlan` returns `LOCAL_NETWORK_UNSUPPORTED_MESSAGE` ("Network
+  sandbox policy is not enforceable on the local sandbox …") whenever a restricted `network`
+  policy is set on the local sandbox, regardless of `enforcement` (runner-only PR, lane
+  `feat/agent-sidecar-trust-enforcement`).
+
+- **Filesystem boundary — DECLARED, enforced NOWHERE.**
+  `SandboxPermission.filesystem` (`on` / `readonly` / `off`) travels on the wire and is
+  versioned, but no backend applies a filesystem jail. See `services/agent/src/protocol.ts`
+  (the `filesystem` field is explicitly annotated "Declared, NOT enforced today" around line
+  159). The capability-config project confirms it: "It declares filesystem confinement but
+  enforces none today (no fs jail on any backend)." (The Claude harness can be told to deny
+  `Write`/`Edit` via its settings, but that is a harness behavior rule, not a sandbox
+  filesystem boundary.)
+
+  **Decided behavior (review): error when `filesystem` is specified, since it is not
+  implemented.** The author's call: "again, should be an error if specified currently since it
+  is not implemented." So a present `sandbox_permission.filesystem` value (on any backend,
+  since none enforce it) should **fail with a not-implemented-style error** at plan time
+  rather than be silently accepted — the same not-implemented-capability pattern as the local
+  network case above (`code.ts`'s `CODE_TOOL_UNSUPPORTED_MESSAGE` gate / the typed
+  `*Unsupported*Error`s in `connections/errors.py`). Declaring a filesystem jail the runner
+  does not apply is the same silent-acceptance trap. **NOW IMPLEMENTED:** `buildRunPlan` returns
+  `FILESYSTEM_UNSUPPORTED_MESSAGE` ("Filesystem sandbox policy is not implemented …") whenever
+  `sandbox_permission.filesystem` is present, on any backend, regardless of `enforcement`
+  (runner-only PR, lane `feat/agent-sidecar-trust-enforcement`).
+
+- **Runner-host execution bypass — gateway tools and stdio MCP run on the runner host;
+  code execution is already removed.**
+  Resolved `code` tools, gateway/callback tools, and stdio MCP servers all funnel through the
+  runner host (the relay, `services/agent/src/tools/relay.ts`), not the sandbox. A
+  network-blocked Daytona sandbox does not confine the ones that still execute — they have the
+  runner host's network. Current verified state:
+  - **`code` execution — ALREADY REMOVED from the sidecar.** `runCodeTool`
+    (`services/agent/src/tools/code.ts`) no longer executes author snippets; it throws a
+    single named constant, `CODE_TOOL_UNSUPPORTED_MESSAGE = "Code tools are not supported by
+    the sidecar."`. The `code` interface still exists and code tools are still advertised to
+    harnesses, but every delivery path (direct Pi, sandbox Pi, the ACP/MCP bridge) funnels a
+    `kind: "code"` call through that throwing gate, so it fails consistently without changing
+    the public wire shape (`tools/dispatch.ts`, `tools/relay.ts`).
+  - **stdio MCP servers — STILL EXECUTE on the runner host (the remaining bypass).** The stdio
+    MCP bridge is fully wired: `services/agent/src/tools/mcp-bridge.ts` (`buildToolMcpServers`)
+    launches `services/agent/src/tools/mcp-server.ts` (a JSON-RPC stdio server) as a child
+    process of the daemon, and `run-plan.ts` `hasStdioMcpServer` flags such servers. These run
+    on the runner host, outside the sandbox boundary.
+
+  `buildRunPlan` currently closes the remaining hole at plan time: under `strict` + restricted
+  network, a run carrying any runner-host-executed tool (`executableToolSpecs`) or any stdio
+  MCP server (`hasStdioMcpServer`) is **rejected**; `best_effort` is the opt-out that accepts
+  the boundary is not a hard guarantee.
+
+  **Decided plan (review) — NOW IMPLEMENTED: stdio MCP-server execution is disabled the same
+  way code execution was — marked not-implemented, sidecar implementation removed, until the
+  security issues are fixed.** The author's call: "we have removed code execution from the code.
+  we should also disable the same way mcp servers implementation until we fix security issues. We
+  simply put them as non implemented and remove the implementation in the sandbox sidecar like
+  we did for tool calls." The runner-host stdio MCP path is now at parity with the code-tool
+  gate: the wire/interface shapes (`McpServerConfig`, the `mcpServers` request field) are kept,
+  but delivery routes through a single not-implemented gate — `MCP_UNSUPPORTED_MESSAGE` ("MCP
+  servers are not supported by the sidecar.") in `services/agent/src/tools/mcp-bridge.ts`,
+  thrown by `buildToolMcpServers` and by `toAcpMcpServers` (`engines/sandbox_agent/mcp.ts`);
+  `mcp-server.ts` is reduced to a refusing stub; and `run-plan.ts` rejects any run carrying a
+  stdio MCP server up front (`hasStdioMcpServer` → `MCP_UNSUPPORTED_MESSAGE`). This eliminates
+  the last runner-host execution bypass rather than relying on the plan-time `strict` gate.
+  **What it removes:** non-Pi harnesses (e.g. Claude) take tools only over MCP, so they can no
+  longer receive custom tools, and user-declared stdio MCP servers are refused; Pi tools
+  (delivered natively through the extension, not MCP) are unaffected. Implemented in the
+  runner-only PR on lane `feat/agent-sidecar-trust-enforcement`; `protocol.ts` was not edited
+  (A3-owned; the gate is runtime behavior, not a new wire field).
+
+  **Layering clarification (review): gateway tools are fine and need NO action — they are NOT
+  part of `sandbox_permission`.** The author's call: "for gateway it's alright. no need to do
+  anything. that is not part of sandbox permission, that is part of the permission (the other
+  parameter that is about the sidecar and obviously there we should deal with these
+  correctly)." Gateway/callback tools belong to a **different layer** than the sandbox network
+  boundary:
+  - **Layer 2 — `sandbox_permission`** is the network/filesystem boundary the agent runs
+    inside. That is what this Part 2 enforcement matrix is about (Daytona `networkBlockAll` /
+    `networkAllowList`, the local fail-loud, the filesystem-declared-only state).
+  - **Layer 3 — tool-permission** is the separate sidecar parameter that governs whether a
+    tool may run at all (`allow` / `ask` / `deny`, including HITL). That is where gateway tools
+    are governed (`relay.ts` `resolvePermission`), and it is handled correctly there.
+
+  So gateway tools are not a `sandbox_permission` concern and require no change here. They are
+  not lumped into the sandbox boundary; do not treat the gateway as a sandbox-enforcement gap.
+
+- **Legacy in-process engine — REMOVED (historical).**
+  The legacy in-process `backend: "pi"` engine (`engines/pi.ts`) has been removed by a sibling
+  agent (A3); `services/agent/src/engines/` no longer contains `pi.ts` (verified: only
+  `sandbox_agent.ts`, `sandbox_agent/`, and `skills.ts` remain, and `pi.ts` shows as deleted in
+  the working tree). It is no longer a runtime path, so it is not part of the enforcement
+  picture; it is mentioned here only as removed history. The single enforced path is
+  `backend: "sandbox-agent"` (the default and now the only sandbox path).
+  *(A few stale doc-comments in `engines/skills.ts`, `tools/mcp-server.ts`, and
+  `tracing/otel.ts` still reference the in-process Pi engine; those are leftover comments, not
+  live code, and are out of scope for this docs revision.)*
+
+### Enforcement matrix
+
+Axis × backend. "Enforced" means a hard boundary is applied; "fail loud" means the run is
+rejected at plan time. The **Decided** column records the review decisions captured above (the
+code change is a separate task; this is the agreed target behavior).
+
+| Capability | Daytona sandbox | Local sandbox | Decided behavior (review) |
+| --- | --- | --- | --- |
+| **Network: `off`** | **Enforced** — `networkBlockAll: true` at sandbox create (`provider.ts`) | **Not enforceable** — host has no egress control | Local: **error if `network` set** (not-implemented-style, any `enforcement`), mirroring `code.ts`'s unsupported gate — not a `best_effort` warn |
+| **Network: `allowlist` (non-empty)** | **Enforced** — `networkAllowList: "<cidr,...>"` (`provider.ts`) | **Not enforceable** — same as above | Same: local → error |
+| **Network: `allowlist` (empty)** | **Enforced as block-all** — empty list = allow nothing (`provider.ts`) | **Not enforceable** — same as above | Same: local → error |
+| **Network: `on` / unset** | No restriction (default-open) | No restriction (default-open) | Baseline, no change |
+| **Filesystem (`on`/`readonly`/`off`)** | **Declared only — enforced NOWHERE** (`protocol.ts:159`) | **Declared only — enforced NOWHERE** | **Error if `filesystem` specified** (not implemented on any backend), not silent-accept |
+| **`enforcement: strict`** | Rejects what Daytona cannot guarantee (runner-host tools, stdio MCP) | Rejects all restricted-network runs | Stays; the local-network + filesystem errors fire regardless of `enforcement` |
+| **`enforcement: best_effort`** | Allows with no hard guarantee | Allows with no hard guarantee | No longer the escape hatch for local network / filesystem (those error unconditionally) |
+| **`code` tools** | **Execution REMOVED** — `runCodeTool` throws `CODE_TOOL_UNSUPPORTED_MESSAGE` (`code.ts`); advertised but not run | Same — removed | Already done; this is the not-implemented pattern the others mirror |
+| **stdio MCP servers** | Run on **runner host**, bypass the sandbox net boundary; `strict`+restricted → fail loud | Run on runner host (always unconfined) | **Disable the same way as code:** not-implemented gate + **remove the sidecar implementation** (`mcp-bridge.ts` / `mcp-server.ts` / stdio plumbing) until security is fixed |
+| **gateway / callback tools** | Run on runner host; governed by **Layer 3 tool-permission** (`relay.ts` `resolvePermission`), **not** `sandbox_permission` | Same | **No action** — NOT a `sandbox_permission` (Layer 2) concern; handled in the Layer-3 tool-permission/HITL parameter |
+| **Legacy `backend: "pi"` (in-process)** | **REMOVED** — `engines/pi.ts` no longer exists | **REMOVED** | Gone (A3); not a path, historical only |
+
+One-line summary: **network egress is a real boundary on Daytona and only on Daytona, and on
+the local sandbox a set `network` policy now errors (not-implemented) rather than running
+unconfined; `filesystem` is enforced nowhere, so specifying it now errors too; `code`
+execution is already removed (throwing gate) and stdio MCP is now disabled the same way and its
+sidecar implementation removed; gateway tools are out of scope here because they live in
+Layer-3 tool-permission, not Layer-2 `sandbox_permission`; and the legacy in-process `pi`
+engine is gone.**
+
+### protocol.ts comment correction (network part APPLIED by A3; filesystem follow-up pending)
+
+The original stale comment on `SandboxPermission` (`services/agent/src/protocol.ts`, the
+"Plumbing only today … does NOT yet apply it on the sandbox provider" line) **has been
+corrected by A3**. The current comment (around `protocol.ts:144-152`) now reads, correctly:
+
+```ts
+ * (fail when the boundary cannot be applied) or `best_effort`. The network policy IS enforced
+ * on Daytona (`provider.ts` `daytonaNetworkFields`); on the local sidecar it cannot be a hard
+ * guarantee, so a restricted-network run there is rejected under `strict` (`run-plan.ts`).
+ * `filesystem` is declared-only on every provider.
+```
+
+So the network half of this section is **done**. The local-network and filesystem error
+behavior has now **landed** in `run-plan.ts` (runner-only PR, lane
+`feat/agent-sidecar-trust-enforcement`), so two comment follow-ups remain for the protocol.ts
+owner (A3) to keep the comment in step with the live code — the implementation PR did NOT edit
+`protocol.ts` (it is A3's shared surface, and the error behavior is runtime, not a wire field):
+
+- The `filesystem` field annotation (`/** Declared, NOT enforced today. */`) should become a
+  not-implemented error contract: specifying `filesystem` is now rejected as not implemented
+  (`run-plan.ts` `FILESYSTEM_UNSUPPORTED_MESSAGE`, any backend, any `enforcement`).
+- The local-network wording ("rejected under `strict`") should become "rejected whenever a
+  `network` policy is set (not enforceable on the local sandbox), independent of `enforcement`"
+  (`run-plan.ts` `LOCAL_NETWORK_UNSUPPORTED_MESSAGE`).
+
+**Coordination flag:** `services/agent/src/protocol.ts` is a shared-risk surface owned this
+cycle by a sibling code agent (**A3**). This research/proposal project does **not** edit
+`protocol.ts`; A3 already applied the network correction, and the two follow-ups above are
+recorded here for whoever lands the filesystem/local-network error behavior.
+
+---
+
+## Relationship to existing work
+
+- The **capability-config** project (`../capability-config/`) built the Layer-2 enforcement
+  this matrix records (Daytona `networkBlockAll`/`networkAllowList`, the local fail-loud, the
+  runner-host guard). This project does not change that; it documents the verified result, fixes
+  the stale comments around it, and records the review decisions to tighten the not-implemented
+  axes (local network and filesystem now error; stdio MCP to be disabled).
+- The **sidecar-deployment-proposal** project (`../sidecar-deployment-proposal/`) defines how
+  the runner is deployed (`AGENTA_AGENT_RUNNER_URL`, Compose/Helm/Railway). The decided
+  near-term work (localhost/in-cluster-only binding + the optional `/run` token) belongs in that
+  proposal's hardening section when implemented; the deferred items (TLS/mTLS, scoped tokens,
+  payload encryption) belong on its longer-term hardening path.
+- Composio, the tool gateway (gateway/callback tools), and named connections are unchanged by
+  this work; they are referenced only as existing surfaces, and gateway tools are governed by
+  Layer-3 tool-permission, not the sandbox boundary. **MCP is the exception:** this revision
+  records a decision to disable the stdio MCP-server implementation in the sidecar (parity with
+  the already-removed code execution) until the security issues are fixed — the code change is a
+  separate task, not made in this PR.
diff --git a/docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/status.md b/docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/status.md
new file mode 100644
index 0000000000..787052f053
--- /dev/null
+++ b/docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/status.md
@@ -0,0 +1,152 @@
+# Status
+
+Source of truth for where this project stands. Keep it current.
+
+## State
+
+**Runner implementation LANDED (separate runner-only PR).** The research + proposal +
+author-review revision (Part 1 trust/transport, Part 2 enforcement matrix) is complete and was
+docs-only. The decided near-term code changes are now implemented in a follow-up runner-only PR
+(lane `feat/agent-sidecar-trust-enforcement`, `services/agent/src/**` only):
+
+1. **Network isolation + optional `/run` token** (Part 1 steps 1–2): the sidecar binds to
+   loopback by default (`AGENTA_AGENT_RUNNER_HOST`, default `127.0.0.1`, never `0.0.0.0`), and
+   an OPTIONAL shared token (`AGENTA_AGENT_RUNNER_TOKEN`, default OFF) gates `/run` with a
+   constant-time compare when set (`server.ts`). `/health` stays open. mTLS / scoped tokens /
+   payload encryption remain deferred.
+2. **Error-on-unimplemented `sandbox_permission`** (Part 2): `run-plan.ts` now errors the
+   not-implemented way (mirroring `code.ts`) when a restricted `network` policy is set on the
+   LOCAL sandbox (any `enforcement`), and whenever `filesystem` is specified (any backend, since
+   none enforce it). The Daytona strict-mode runner-host-tool guard stays.
+3. **stdio MCP disabled** (Part 2): the stdio MCP implementation is disabled the same way code
+   execution was — `MCP_UNSUPPORTED_MESSAGE` in `tools/mcp-bridge.ts`; `buildToolMcpServers`,
+   `toAcpMcpServers`, and `mcp-server.ts` throw/refuse; `run-plan.ts` rejects any run carrying a
+   stdio MCP server. The wire shapes (`McpServerConfig`, `mcpServers`) are unchanged; only the
+   runtime delivery is gated. This removes custom-tool delivery to non-Pi harnesses (Claude) and
+   user-declared stdio MCP servers until the security issues are fixed.
+
+`protocol.ts` was NOT edited (A3-owned; the error-on-unimplemented behavior is runtime, not a new
+wire field). Gateway/callback tools were NOT touched (Layer-3 tool-permission, not Layer-2
+`sandbox_permission`). Tests green: `pnpm test` (168) + `pnpm run typecheck` in `services/agent`.
+
+## Scope
+
+- **In scope:** documenting the trust model of the service→runner `/run` boundary; proposing
+  transport/auth options; correcting the "declared, not enforced" record for `SandboxPermission`.
+- **Out of scope (unchanged by this work):** Composio, the tool gateway, named connections,
+  MCP. Token issuance design (Part 1 option 4) is platform-side and only recommended here.
+  Editing `protocol.ts` (owned by sibling agent A3 this cycle).
+
+## Key findings (verified in code)
+
+Part 1 (transport):
+- `services/oss/src/agent/config.py` `runner_url()` reads `AGENTA_AGENT_RUNNER_URL`: set →
+  HTTP; unset → local CLI subprocess.
+- `sdks/python/agenta/sdk/agents/utils/ts_runner.py` `deliver_http` posts `json=payload` with
+  no auth header, no TLS, no shared token. Only `Accept` is set (streaming).
+- `services/agent/src/server.ts` `createRequestListener` serves `/run` with **no auth check**.
+- The payload carries plaintext `secrets` (provider keys), `toolCallback.authorization`, and
+  `trace.authorization` (the caller's full `Authorization` header, copied in
+  `services/oss/src/agent/tracing.py`).
+- Precedent for a shared-token gate already exists: `X-Agenta-Internal-Token` /
+  `AGENTA_VAULT_RESOLVE_INTERNAL_TOKEN` on the platform vault-resolve route.
+
+Part 2 (enforcement):
+- Daytona network egress IS enforced: `provider.ts` `daytonaNetworkFields()` →
+  `buildSandboxProvider()` (off/empty-allowlist → `networkBlockAll`, non-empty → `networkAllowList`).
+- Local sandbox cannot enforce egress: `run-plan.ts` rejects under `strict`, warns under `best_effort`
+  today (decision: should ERROR whenever `network` is set on local — see review decisions).
+- Filesystem declared-only, enforced nowhere (`protocol.ts:159`; decision: should ERROR when specified).
+- **`code` execution is ALREADY REMOVED** from the sidecar: `runCodeTool` (`tools/code.ts`) throws
+  `CODE_TOOL_UNSUPPORTED_MESSAGE` ("Code tools are not supported by the sidecar."); interface kept,
+  every path funnels through the throwing gate (`dispatch.ts` / `relay.ts`). This is the
+  not-implemented pattern the local-network + filesystem + MCP decisions mirror.
+- **stdio MCP is STILL implemented** on the runner host: `tools/mcp-bridge.ts`
+  (`buildToolMcpServers`) launches `tools/mcp-server.ts` (JSON-RPC stdio). Decision: disable it the
+  same way as code execution and remove the sidecar implementation.
+- gateway/callback tools run on the runner host (`relay.ts`), governed by **Layer-3
+  tool-permission** (`resolvePermission`), NOT `sandbox_permission` (Layer 2) — no change needed.
+- **Legacy in-process `pi` engine REMOVED:** `engines/pi.ts` no longer exists (A3 landed it;
+  shows as `D` in the working tree). Only `sandbox_agent.ts` / `sandbox_agent/` / `skills.ts` remain.
+- The stale `protocol.ts:149-150` comment was ALREADY corrected by A3 (now reads "network policy
+  IS enforced on Daytona … `filesystem` is declared-only on every provider"). Two follow-up comment
+  edits remain (filesystem→error, local-network→error) once that behavior lands.
+
+## Review decisions (PR #4831)
+
+The author left 6 inline comments; each is folded into the README:
+
+1. README network-isolation/loopback-binding option — **endorsed** ("agree"). Kept as decided step 1.
+2. Recommendation — **scope near-term to steps 1 & 2 only** ("only implement 1 and 2 now"). mTLS /
+   short-lived scoped tokens / payload encryption moved to an explicit "Later / deferred hardening"
+   subsection.
+3. Local network — **error when `network` set on local** (not-implemented-style, mirroring
+   `code.ts`'s `CODE_TOOL_UNSUPPORTED_MESSAGE` gate / the typed `*Unsupported*Error`s in
+   `connections/errors.py`), regardless of `enforcement`. Decision documented; code change separate.
+4. Filesystem — **error when `filesystem` specified** (not implemented anywhere), same pattern.
+5. MCP / gateway / layering — **disable stdio MCP-server execution in the sidecar** the same way
+   code execution was disabled (not-implemented gate + remove `mcp-bridge.ts` / `mcp-server.ts` /
+   stdio plumbing), until security is fixed. **gateway tools = no action**: they are Layer-3
+   tool-permission, NOT Layer-2 `sandbox_permission`. Code-state verified: code exec removed, MCP
+   still present. Decisions documented; code change separate (another agent owns the sidecar).
+6. Legacy `pi` engine — verified **removed** (A3); doc now presents it as removed/historical only.
+
+## Recommendation (Part 1)
+
+**Decided near-term scope (implement now): steps 1 & 2 only** (author: "only implement 1 and 2
+now"). (1) assert + enforce localhost/in-cluster-only binding and document the plaintext-secrets
+trust model; (2) add an optional shared `/run` token (reuse the `X-Agenta-Internal-Token`
+pattern). **Deferred (NOT near-term):** (3) TLS, mTLS via a service mesh where available; (4)
+short-lived audience-scoped trace/callback tokens; (5) payload encryption held in reserve. Full
+rationale in README Part 1.
+
+## Coordination
+
+- Lease claimed on `docs/design/agent-workflows/scratch/agent-coordination.md` as
+  `sidecar-trust-research` (docs-only, this project's two files only).
+- **A3 already applied the network half** of the `protocol.ts` comment correction (the comment
+  now reads "network policy IS enforced on Daytona … `filesystem` is declared-only"). Two
+  follow-up comment edits remain (filesystem→error, local-network→error) once that behavior
+  lands. This project does not touch `protocol.ts`.
+- The decided code changes (local-network + filesystem → error; disable + remove stdio MCP in
+  the sidecar) are SEPARATE tasks. Another agent is editing the sidecar (`services/agent/`) this
+  cycle, so this revision touches ONLY this project's two doc files and stages only them.
+- Related (not blocking): the `contract-versioning (A1)` lease notes `/health` advertises
+  `protocol: 1` that the Python client never reads. Orthogonal to this work.
+
+## Open questions / follow-ups
+
+1. **Whose decision is the default transport posture?** Decided near-term = localhost/in-cluster-only
+   binding + optional token; the actual default (token on/off) is a deployment policy call for the
+   sidecar-deployment owners. (TLS/mTLS now deferred, not near-term.)
+2. **Deferred hardening** (TLS/mTLS, short-lived audience-scoped trace/callback tokens, payload
+   encryption) — explicitly out of near-term scope per review; recorded, not designed here.
+3. **Where the Part 1 recommendation lands when implemented:** the hardening section of
+   `../sidecar-deployment-proposal/proposal.md`.
+4. **Decided code changes (separate tasks, not this PR):** (a) `run-plan.ts` error when local
+   `network` set / when `filesystem` specified (not-implemented gate, mirroring `code.ts`); (b)
+   disable + remove the stdio MCP sidecar implementation (`mcp-bridge.ts` / `mcp-server.ts` /
+   stdio plumbing) the way code execution was removed; (c) the two follow-up `protocol.ts` comment
+   edits. Owned by whoever lands the runner change this cycle.
+
+## Changelog
+
+- 2026-06-24: Project created. Part 1 (transport trust proposal) + Part 2 (enforcement matrix
+  + protocol.ts comment correction) written and verified against code. Lease + A3 flag posted
+  on the coordination board. No code changed.
+- 2026-06-24: Revised per author review on PR #4831 (6 inline comments). Scoped near-term work to
+  steps 1 & 2 (deferred mTLS / scoped tokens / payload encryption to a "Later / deferred"
+  subsection); documented error-on-specified behavior for local network + filesystem (mirroring
+  the `code.ts` not-implemented gate); documented the decision to disable + remove the stdio MCP
+  sidecar implementation; clarified gateway = Layer-3 tool-permission, not Layer-2
+  `sandbox_permission`; updated the legacy `pi` engine to removed/historical. Verified code state:
+  `code` execution already removed (`runCodeTool` throws), stdio MCP still present, `engines/pi.ts`
+  gone, `protocol.ts` network comment already corrected by A3. Docs-only, two files; no code changed.
+- 2026-06-24: **Runner implementation LANDED** (separate runner-only PR, lane
+  `feat/agent-sidecar-trust-enforcement`, `services/agent/src/**` only). (1) loopback binding +
+  optional `/run` token in `server.ts`; (2) `run-plan.ts` errors on local `network` / on
+  `filesystem` specified (not-implemented gate, any enforcement); (3) stdio MCP disabled
+  (`MCP_UNSUPPORTED_MESSAGE`, `mcp-bridge.ts`/`mcp-server.ts`/`toAcpMcpServers`/run-plan gate),
+  removing custom-tool delivery to non-Pi harnesses + user stdio MCP. `protocol.ts` untouched
+  (A3-owned; runtime behavior, no new wire field). Runner README MCP/tools lines synced. Tests
+  green: 168 vitest + typecheck in `services/agent`.
diff --git a/hosting/docker-compose/ee/docker-compose.dev.yml b/hosting/docker-compose/ee/docker-compose.dev.yml
index 65b9060130..203b55cdf9 100644
--- a/hosting/docker-compose/ee/docker-compose.dev.yml
+++ b/hosting/docker-compose/ee/docker-compose.dev.yml
@@ -464,6 +464,10 @@ services:
         # the runner reads for the `daytona` sandbox provider.
         environment:
             PORT: "8765"
+            # Bind on all interfaces: the co-located Python `services` container reaches the
+            # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model
+            # default of `127.0.0.1` (same-host only) is unreachable cross-container.
+            AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}
             PI_CODING_AGENT_DIR: /pi-agent
             # Tracing export fallback (used when a request carries no usable OTLP
             # credential). Must be reachable from this container.
diff --git a/hosting/docker-compose/oss/docker-compose.dev.yml b/hosting/docker-compose/oss/docker-compose.dev.yml
index ae8f36aad0..283307d28c 100644
--- a/hosting/docker-compose/oss/docker-compose.dev.yml
+++ b/hosting/docker-compose/oss/docker-compose.dev.yml
@@ -444,6 +444,10 @@ services:
         # its own port plus optional runner-scoped provider credentials.
         environment:
             PORT: "8765"
+            # Bind on all interfaces: the co-located Python `services` container reaches the
+            # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model
+            # default of `127.0.0.1` (same-host only) is unreachable cross-container.
+            AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}
             PI_CODING_AGENT_DIR: /pi-agent
             AGENTA_HOST: ${AGENTA_HOST:-}
             AGENTA_API_KEY: ${AGENTA_API_KEY:-}
diff --git a/hosting/docker-compose/oss/docker-compose.gh.yml b/hosting/docker-compose/oss/docker-compose.gh.yml
index 4c93318d12..c07da831e3 100644
--- a/hosting/docker-compose/oss/docker-compose.gh.yml
+++ b/hosting/docker-compose/oss/docker-compose.gh.yml
@@ -347,6 +347,10 @@ services:
         # === CONFIGURATION ======================================== #
         environment:
             PORT: "8765"
+            # Bind on all interfaces: the co-located Python `services` container reaches the
+            # sidecar over the compose network at `sandbox-agent:8765`, so the trust-model
+            # default of `127.0.0.1` (same-host only) is unreachable cross-container.
+            AGENTA_AGENT_RUNNER_HOST: ${AGENTA_AGENT_RUNNER_HOST:-0.0.0.0}
             AGENTA_HOST: ${AGENTA_HOST:-}
             AGENTA_API_KEY: ${AGENTA_API_KEY:-}
             SANDBOX_AGENT_PROVIDER: ${SANDBOX_AGENT_PROVIDER:-local}
diff --git a/services/agent/README.md b/services/agent/README.md
index 0fdeac4e7e..2834713e36 100644
--- a/services/agent/README.md
+++ b/services/agent/README.md
@@ -2,8 +2,8 @@
 
 The Node side of the agent workflow service. It runs the actual agent loop and serves one
 contract: a JSON request in, a structured result out. The Python service
-(`services/oss/src/agent/`) decides *what* to run (config, tools, secrets, trace) and calls
-in here; this package *runs* it. It lives in Node because the harnesses (Pi, Claude Code,
+(`services/oss/src/agent/`) decides _what_ to run (config, tools, secrets, trace) and calls
+in here; this package _runs_ it. It lives in Node because the harnesses (Pi, Claude Code,
 and the `sandbox-agent` package) are Node libraries with no Python SDK.
 
 ## How it is invoked
@@ -36,8 +36,8 @@ src/
     callback.ts       the one /tools/call HTTP client
     code.ts           execute resolved code tools in a scoped subprocess
     dispatch.ts       dispatch resolved tools by executor kind
-    mcp-bridge.ts     build the MCP server config that exposes tools to a harness
-    mcp-server.ts     the stdio MCP server itself (launched per session by the daemon)
+    mcp-bridge.ts     stdio MCP bridge — DISABLED (throws MCP_UNSUPPORTED_MESSAGE)
+    mcp-server.ts     the stdio MCP server — REMOVED (refuses to serve; no longer launched)
   extensions/
     agenta.ts         the Pi extension (tracing + tools), bundled into dist/ for Pi to load
 ```
@@ -85,10 +85,15 @@ live workflow span.
 ## Tools
 
 Tools are resolved in the Python backend and arrive on the request as `customTools` plus a
-`toolCallback`. Delivery is capability-routed: the Pi extension registers them natively;
-other harnesses get them over MCP through `tools/mcp-bridge.ts` + `tools/mcp-server.ts`.
-Either way each call POSTs back to Agenta's `/tools/call` (`tools/callback.ts`), so the
-provider key and connection auth stay server-side.
+`toolCallback`. The Pi extension registers them natively, and each call POSTs back to Agenta's
+`/tools/call` (`tools/callback.ts`) so the provider key and connection auth stay server-side.
+
+The stdio MCP delivery path that exposed tools to non-Pi harnesses (`tools/mcp-bridge.ts` +
+`tools/mcp-server.ts`) is **disabled** in the sidecar — it launched an unconfined child process
+on the runner host, the same execution bypass that had code tools removed. Until the security is
+fixed, delivering a tool over MCP throws `MCP_UNSUPPORTED_MESSAGE`, so a non-Pi harness (e.g.
+Claude) cannot take custom tools, and the run plan refuses any request carrying a stdio MCP
+server. See `docs/design/agent-workflows/projects/sidecar-trust-and-sandbox-enforcement/`.
 
 ## The extension bundle
 
diff --git a/services/agent/src/engines/sandbox_agent/mcp.ts b/services/agent/src/engines/sandbox_agent/mcp.ts
index 667c31b378..896ad6f476 100644
--- a/services/agent/src/engines/sandbox_agent/mcp.ts
+++ b/services/agent/src/engines/sandbox_agent/mcp.ts
@@ -4,35 +4,38 @@ import type {
   ResolvedToolSpec,
   ToolCallbackContext,
 } from "../../protocol.ts";
-import { buildToolMcpServers, type McpServerStdio } from "../../tools/mcp-bridge.ts";
+import {
+  buildToolMcpServers,
+  MCP_UNSUPPORTED_MESSAGE,
+  type McpServerStdio,
+} from "../../tools/mcp-bridge.ts";
 
 type Log = (message: string) => void;
 
 /**
- * Convert user-declared MCP servers (already resolved server-side, secrets injected into
- * `env`) into ACP stdio entries. Only `stdio` is delivered over ACP today.
+ * Convert user-declared MCP servers into ACP stdio entries — DISABLED in the sidecar.
+ *
+ * A stdio MCP server launches an arbitrary process on the runner host, outside the sandbox
+ * boundary, so the implementation is disabled until its security is fixed (parity with the
+ * removed code execution; see `tools/mcp-bridge.ts`). The wire shape (`McpServerConfig`) is
+ * kept, but any stdio server throws `MCP_UNSUPPORTED_MESSAGE` rather than being delivered.
+ * Remote (`http`) servers were never delivered over ACP and are still skipped (logged), so a
+ * request carrying only remote servers stays a no-op.
  */
 export function toAcpMcpServers(
   servers: McpServerConfig[] | undefined,
   log: Log = () => {},
 ): McpServerStdio[] {
-  const out: McpServerStdio[] = [];
   for (const s of servers ?? []) {
     if ((s.transport ?? "stdio") !== "stdio" || !s.command) {
-      log(`skipping non-stdio MCP server '${s?.name ?? "?"}' (remote transport deferred)`);
+      log(
+        `skipping non-stdio MCP server '${s?.name ?? "?"}' (remote transport deferred)`,
+      );
       continue;
     }
-    if (s.tools && s.tools.length > 0) {
-      log(`MCP server '${s.name}': per-server tool allowlist not enforced over ACP (v1)`);
-    }
-    out.push({
-      name: s.name,
-      command: s.command,
-      args: s.args ?? [],
-      env: Object.entries(s.env ?? {}).map(([name, value]) => ({ name, value: String(value) })),
-    });
+    throw new Error(MCP_UNSUPPORTED_MESSAGE);
   }
-  return out;
+  return [];
 }
 
 export interface BuildSessionMcpServersInput {
diff --git a/services/agent/src/engines/sandbox_agent/run-plan.ts b/services/agent/src/engines/sandbox_agent/run-plan.ts
index a79b493b77..91fdec1608 100644
--- a/services/agent/src/engines/sandbox_agent/run-plan.ts
+++ b/services/agent/src/engines/sandbox_agent/run-plan.ts
@@ -11,6 +11,7 @@ import {
   resolvePromptText,
 } from "../../protocol.ts";
 import { executableToolSpecs } from "../../tools/public-spec.ts";
+import { MCP_UNSUPPORTED_MESSAGE } from "../../tools/mcp-bridge.ts";
 import {
   type MaterializedSkill,
   resolveSkillDirs as defaultResolveSkillDirs,
@@ -19,6 +20,23 @@ import { buildTurnText } from "./transcript.ts";
 
 type Log = (message: string) => void;
 
+/**
+ * Not-implemented sandbox-boundary gates. These mirror the code-tool gate
+ * (`tools/code.ts` `CODE_TOOL_UNSUPPORTED_MESSAGE`): a declared capability the runner cannot
+ * actually enforce fails loudly with a single named-constant message rather than being silently
+ * accepted, so a run never proceeds believing a boundary holds when it does not.
+ */
+
+/** A restricted `network` policy on the LOCAL sandbox is not enforceable (no host egress control). */
+export const LOCAL_NETWORK_UNSUPPORTED_MESSAGE =
+  "Network sandbox policy is not enforceable on the local sandbox (the sidecar runs on this " +
+  "host with no egress control); run on daytona, or remove sandbox_permission.network.";
+
+/** `filesystem` confinement is declared on the wire but applied by no backend. */
+export const FILESYSTEM_UNSUPPORTED_MESSAGE =
+  "Filesystem sandbox policy is not implemented (no backend applies a filesystem jail); " +
+  "remove sandbox_permission.filesystem.";
+
 export interface RunPlan {
   harness: string;
   acpAgent: string;
@@ -142,38 +160,49 @@ export function buildRunPlan(
   const toolSpecs = (request.customTools as ResolvedToolSpec[]) ?? [];
   const executableToolSpecsForRun = executableToolSpecs(toolSpecs);
 
-  // Layer 2 (S1b/S1g): enforce the declared network boundary, and fail loud where it cannot
-  // be a hard guarantee. Only `strict` blocks; `best_effort` is the per-axis opt-out that
-  // accepts the boundary may not hold. `mode: "on"` (or no policy) imposes no restriction.
-  // Checked before any cwd is created so a rejected run does not orphan a temp dir.
+  // Not-implemented boundary gates (sidecar-trust Part 2): a declared capability the runner
+  // cannot actually enforce fails loudly, the way code tools do (`tools/code.ts`), rather than
+  // being silently accepted. These fire BEFORE any cwd is created (so a rejected run never
+  // orphans a temp dir) and are unconditional — `enforcement` is no longer the escape hatch,
+  // because the boundary is not applied on any path regardless of strict/best_effort.
+
+  // `filesystem` confinement is declared on the wire but applied by no backend. Specifying it
+  // therefore errors everywhere; do not pretend a jail exists.
+  if (request.sandboxPermission?.filesystem !== undefined) {
+    return { ok: false, error: FILESYSTEM_UNSUPPORTED_MESSAGE };
+  }
+
+  // A restricted `network` policy on the LOCAL sandbox cannot be enforced (the sidecar runs on
+  // this host with no per-run egress control), so it errors regardless of `enforcement`. On
+  // Daytona the policy IS applied (`provider.ts` `daytonaNetworkFields`).
   const network = request.sandboxPermission?.network;
   const networkRestricted = !!network && (network.mode ?? "on") !== "on";
+  if (networkRestricted && !isDaytona) {
+    return { ok: false, error: LOCAL_NETWORK_UNSUPPORTED_MESSAGE };
+  }
+
+  // stdio MCP servers run as arbitrary processes on the RUNNER HOST, outside the sandbox
+  // boundary, and the sidecar's stdio MCP implementation is disabled (parity with the removed
+  // code execution) until its security is fixed. Refuse any run carrying one, the way code
+  // tools are gated — keep the wire shape, but the delivery is not supported.
+  if (hasStdioMcpServer(request.mcpServers)) {
+    return { ok: false, error: MCP_UNSUPPORTED_MESSAGE };
+  }
+
+  // Layer 2: even on Daytona, code/gateway tools run on the RUNNER HOST via the relay, not
+  // inside the sandbox, so they bypass the sandbox network boundary. Under `strict` + a
+  // restricted network, refuse them; `best_effort` is the opt-out that accepts the boundary is
+  // not a hard guarantee.
   const strict = request.sandboxPermission?.enforcement === "strict";
-  if (networkRestricted && strict) {
+  if (networkRestricted && isDaytona && strict) {
     const mode = network?.mode ?? "on";
-    // Most specific first: the local sidecar has no egress control at all, so any restricted
-    // network is unenforceable; Daytona applies it via networkBlockAll/networkAllowList.
-    if (!isDaytona) {
-      return {
-        ok: false,
-        error:
-          `local sandbox cannot enforce network:${mode} (the local sidecar runs on this ` +
-          `host with no egress control); set enforcement=best_effort to run locally without ` +
-          `the guarantee, or run on daytona.`,
-      };
-    }
-    // Even on Daytona, code/gateway tools and stdio MCP run on the RUNNER HOST via the relay,
-    // not inside the sandbox, so they bypass the sandbox network boundary.
-    if (
-      executableToolSpecsForRun.length > 0 ||
-      hasStdioMcpServer(request.mcpServers)
-    ) {
+    if (executableToolSpecsForRun.length > 0) {
       return {
         ok: false,
         error:
-          `code/gateway tools and stdio MCP servers run on the runner host and would bypass ` +
-          `the sandbox network boundary; remove them, or set enforcement=best_effort to accept ` +
-          `that network:${mode} is not a hard guarantee.`,
+          `code/gateway tools run on the runner host and would bypass the sandbox network ` +
+          `boundary; remove them, or set enforcement=best_effort to accept that ` +
+          `network:${mode} is not a hard guarantee.`,
       };
     }
   }
diff --git a/services/agent/src/server.ts b/services/agent/src/server.ts
index b7b5f38aac..8868a497fa 100644
--- a/services/agent/src/server.ts
+++ b/services/agent/src/server.ts
@@ -12,6 +12,7 @@
  * `createAgentServer(run)` is the testable seam: it builds the server around an injectable
  * engine runner so the HTTP behavior can be tested with a fake engine (no live harness).
  */
+import { timingSafeEqual } from "node:crypto";
 import {
   createServer,
   type IncomingMessage,
@@ -31,6 +32,51 @@ import { isEntrypoint } from "./entry.ts";
 
 const PORT = Number(process.env.PORT ?? 8765);
 
+// Bind to loopback by default (sidecar-trust step 1): the `/run` body carries plaintext
+// provider secrets and reusable bearer tokens, so the sidecar MUST sit on a trusted,
+// non-public network. `127.0.0.1` keeps it reachable only from the same host (the co-located
+// Python service) and never `0.0.0.0`. In Kubernetes/Compose, set `AGENTA_AGENT_RUNNER_HOST`
+// to the private pod/internal-network interface; never publish the port to the host.
+const HOST = process.env.AGENTA_AGENT_RUNNER_HOST ?? "127.0.0.1";
+
+// Optional shared `/run` token (sidecar-trust step 2): default OFF. When
+// `AGENTA_AGENT_RUNNER_TOKEN` is set, every `/run` request must present the same secret (in
+// `Authorization: Bearer <token>` or `X-Agenta-Runner-Token: <token>`); otherwise it is
+// rejected with 401. Cheap defense-in-depth against accidental exposure on top of network
+// isolation; a static shared secret is not a substitute for TLS (deferred). Unset = no check,
+// so co-located/loopback deployments are unaffected.
+const RUNNER_TOKEN_ENV = "AGENTA_AGENT_RUNNER_TOKEN";
+
+/** Constant-time string compare so the token check does not leak length/prefix via timing. */
+function tokensMatch(provided: string, expected: string): boolean {
+  const a = Buffer.from(provided, "utf8");
+  const b = Buffer.from(expected, "utf8");
+  if (a.length !== b.length) return false;
+  return timingSafeEqual(a, b);
+}
+
+/** The bearer/token a caller presented, from either accepted header. Empty string if none. */
+function presentedToken(req: IncomingMessage): string {
+  const header = req.headers["x-agenta-runner-token"];
+  if (typeof header === "string" && header) return header;
+  const auth = req.headers["authorization"];
+  if (typeof auth === "string" && auth) {
+    const match = /^Bearer\s+(.+)$/i.exec(auth);
+    if (match) return match[1];
+  }
+  return "";
+}
+
+/**
+ * Whether this `/run` request is authorized. The check is OFF unless the operator opts in by
+ * setting `AGENTA_AGENT_RUNNER_TOKEN`; when set, the presented token must match exactly.
+ */
+function isAuthorized(req: IncomingMessage): boolean {
+  const expected = process.env[RUNNER_TOKEN_ENV];
+  if (!expected) return true; // default-off: no token configured, accept (network isolation only)
+  return tokensMatch(presentedToken(req), expected);
+}
+
 /** Run one request through an engine. Tests inject a fake to avoid a live harness. */
 export type RunAgent = (
   request: AgentRunRequest,
@@ -79,7 +125,8 @@ async function runAndStream(
   try {
     result = await run(request, emit, controller.signal);
   } catch (err) {
-    const message = err instanceof Error ? err.stack ?? err.message : String(err);
+    const message =
+      err instanceof Error ? (err.stack ?? err.message) : String(err);
     result = { ok: false, error: message };
   }
   // Streaming delivered the events live, so don't echo them in the terminal record.
@@ -115,12 +162,18 @@ export function createRequestListener(
       }
 
       if (req.method === "POST" && req.url === "/run") {
+        if (!isAuthorized(req)) {
+          return send(res, 401, { ok: false, error: "Unauthorized" });
+        }
         const raw = await readBody(req);
         let request: AgentRunRequest;
         try {
           request = raw.trim() ? (JSON.parse(raw) as AgentRunRequest) : {};
         } catch (err) {
-          return send(res, 400, { ok: false, error: `Invalid JSON: ${String(err)}` });
+          return send(res, 400, {
+            ok: false,
+            error: `Invalid JSON: ${String(err)}`,
+          });
         }
 
         const wantsStream = (req.headers["accept"] ?? "").includes(
@@ -137,7 +190,8 @@ export function createRequestListener(
 
       return send(res, 404, { ok: false, error: "Not found" });
     } catch (err) {
-      const message = err instanceof Error ? err.stack ?? err.message : String(err);
+      const message =
+        err instanceof Error ? (err.stack ?? err.message) : String(err);
       return send(res, 500, { ok: false, error: message });
     }
   };
@@ -162,10 +216,14 @@ if (isEntrypoint(import.meta.url)) {
     );
   });
   process.on("uncaughtException", (err) => {
-    process.stderr.write(`[sandbox-agent] uncaughtException: ${err.stack ?? err.message}\n`);
+    process.stderr.write(
+      `[sandbox-agent] uncaughtException: ${err.stack ?? err.message}\n`,
+    );
   });
 
-  createAgentServer().listen(PORT, () => {
-    process.stderr.write(`[sandbox-agent] http server listening on :${PORT}\n`);
+  createAgentServer().listen(PORT, HOST, () => {
+    process.stderr.write(
+      `[sandbox-agent] http server listening on ${HOST}:${PORT}\n`,
+    );
   });
 }
diff --git a/services/agent/src/tools/mcp-bridge.ts b/services/agent/src/tools/mcp-bridge.ts
index 8273ec1728..3dfa76604e 100644
--- a/services/agent/src/tools/mcp-bridge.ts
+++ b/services/agent/src/tools/mcp-bridge.ts
@@ -1,38 +1,23 @@
 /**
- * WP-8 tool delivery over sandbox-agent/ACP.
+ * Stdio MCP bridge — DISABLED in the sidecar.
  *
- * The Pi extension (extensions/agenta.ts) registers resolved runnable tools (WP-7) as native
- * Pi tools. For a harness that takes tools only through MCP (e.g. Claude), the same resolved
- * specs are exposed as an MCP server whose tool bodies relay back to the runner.
- * The runner keeps private specs/auth in memory and performs the actual execution.
- * `buildToolMcpServers` returns the ACP `mcpServers` entry to attach to the session.
+ * This module used to expose resolved runnable tools (WP-7) to non-Pi harnesses (e.g. Claude)
+ * by launching a stdio MCP child process (mcp-server.ts) on the runner host. That process runs
+ * OUTSIDE the sandbox boundary, so a network-blocked sandbox does not confine it — the same
+ * runner-host execution bypass that had code execution removed. Until the security issues are
+ * fixed, the stdio MCP implementation is disabled the same way: the interface/types remain, but
+ * any attempt to deliver a stdio MCP server throws a single named-constant message
+ * (`MCP_UNSUPPORTED_MESSAGE`), mirroring `tools/code.ts` (`CODE_TOOL_UNSUPPORTED_MESSAGE`).
  *
- * Delivery: a stdio MCP bridge (mcp-server.ts) launched by the daemon. Its env carries
- * only public tool metadata and the relay directory. It never receives scoped env, code,
- * callback auth, or callback endpoints.
+ * The run plan (`engines/sandbox_agent/run-plan.ts`) refuses any run carrying a stdio MCP
+ * server up front, so this throw is a defense-in-depth backstop, not the primary gate.
  */
-import { existsSync } from "node:fs";
-import { dirname, join } from "node:path";
-import { fileURLToPath } from "node:url";
-
 import type { ResolvedToolSpec, ToolCallbackContext } from "../protocol.ts";
-import { executableToolSpecs, publicToolSpecs } from "./public-spec.ts";
 
 export type { ResolvedToolSpec, ToolCallbackContext } from "../protocol.ts";
 
-const HERE = dirname(fileURLToPath(import.meta.url));
-// services/agent/src/tools/mcp-bridge.ts -> services/agent/node_modules/.bin/tsx
-const TSX_BIN = join(HERE, "..", "..", "node_modules", ".bin", "tsx");
-const SERVER = join(HERE, "mcp-server.ts");
-
-/** Resolve how to launch the bridge: an explicit override, else the local tsx bin. */
-function bridgeLauncher(): { command: string; args: string[] } {
-  const override = process.env.AGENTA_TOOL_BRIDGE_COMMAND;
-  if (override) return { command: override, args: [SERVER] };
-  if (existsSync(TSX_BIN)) return { command: TSX_BIN, args: [SERVER] };
-  // Fall back to npx tsx (resolves from PATH wherever the daemon runs).
-  return { command: "npx", args: ["-y", "tsx", SERVER] };
-}
+export const MCP_UNSUPPORTED_MESSAGE =
+  "MCP servers are not supported by the sidecar.";
 
 /** ACP McpServerStdio entry: env is a list of {name, value}. */
 interface EnvVariable {
@@ -48,44 +33,19 @@ export interface McpServerStdio {
 }
 
 /**
- * Build the ACP `mcpServers` list that exposes the resolved tools to the harness.
- *
- * Attachment is decided per tool kind, not on the callback endpoint alone (see protocol.ts
- * `ResolvedToolSpec.kind`; absent kind means `callback` for back-compat):
- *  - `client` tools are browser-fulfilled and not advertised by this server (mcp-server.ts
- *    filters them from tools/list), so they never justify attaching the bridge on their own.
- *  - "Executable here" = non-client (`code` and `callback`). With zero executable specs we
- *    return [] (the no-tools path stays untouched).
- *  - The bridge does not execute tools itself. It sends a request file to `relayDir`, and
- *    the runner executes the private resolved spec in memory. That keeps scoped env, code,
- *    callback auth, and callback endpoints out of child-process env.
+ * Disabled: building a tool MCP bridge would launch an unconfined stdio child process on the
+ * runner host. Throws `MCP_UNSUPPORTED_MESSAGE` whenever there is something to deliver; an
+ * empty/all-`client` spec list is a no-op (returns []) so the no-tools path is untouched.
  */
 export function buildToolMcpServers(
   specs: ResolvedToolSpec[],
   _callbackOrRelayDir?: ToolCallbackContext | string,
-  relayDir?: string,
+  _relayDir?: string,
 ): McpServerStdio[] {
   if (!specs || specs.length === 0) return [];
-
-  // Absent kind defaults to `callback` (back-compat); `client` is the only non-executable kind.
-  const executable = executableToolSpecs(specs);
+  // `client` tools are browser-fulfilled and never go through the bridge; only an executable
+  // (`code`/`callback`) spec would have launched the stdio child, which is what we now refuse.
+  const executable = specs.filter((s) => s.kind !== "client");
   if (executable.length === 0) return [];
-
-  const resolvedRelayDir =
-    typeof _callbackOrRelayDir === "string" ? _callbackOrRelayDir : relayDir;
-  if (!resolvedRelayDir) {
-    const names = executable.map((s) => s.name).join(", ");
-    process.stderr.write(
-      `[tool-bridge] missing tool relay directory: ${executable.length} tool(s) ` +
-        `will fail (${names})\n`,
-    );
-  }
-
-  const env: EnvVariable[] = [
-    { name: "AGENTA_TOOL_PUBLIC_SPECS", value: JSON.stringify(publicToolSpecs(executable)) },
-  ];
-  if (resolvedRelayDir) env.push({ name: "AGENTA_TOOL_RELAY_DIR", value: resolvedRelayDir });
-
-  const { command, args } = bridgeLauncher();
-  return [{ name: "agenta-tools", command, args, env }];
+  throw new Error(MCP_UNSUPPORTED_MESSAGE);
 }
diff --git a/services/agent/src/tools/mcp-server.ts b/services/agent/src/tools/mcp-server.ts
index 5f1c4f8b10..73ab17a8f5 100644
--- a/services/agent/src/tools/mcp-server.ts
+++ b/services/agent/src/tools/mcp-server.ts
@@ -1,131 +1,18 @@
 /**
- * WP-8 tool MCP bridge (stdio server).
+ * Stdio MCP bridge server — REMOVED.
  *
- * The harness only accepts tools over MCP when driven via ACP. This is a minimal,
- * dependency-free MCP stdio server that exposes the backend-resolved runnable tools
- * (WP-7) and relays each tool call back to the runner — so private specs/auth stay in
- * runner memory, exactly as in the in-process Pi path.
+ * This was a JSON-RPC-over-stdio MCP server that exposed backend-resolved tools to non-Pi
+ * harnesses and relayed each call back to the runner. It ran as a child process ON THE RUNNER
+ * HOST, outside the sandbox boundary, so a network-blocked sandbox did not confine it — the
+ * same runner-host execution bypass that had code execution removed (`tools/code.ts`).
  *
- * Launched by the sandbox-agent daemon as a session MCP server (see mcp-bridge.ts). Its env
- * contains only public tool metadata and the relay dir:
- *   AGENTA_TOOL_PUBLIC_SPECS     JSON array of { name, description, inputSchema }
- *   AGENTA_TOOL_RELAY_DIR        directory watched by the runner for tool requests
- *
- * Protocol: JSON-RPC 2.0 over stdio, newline-delimited (the MCP stdio framing). Handles
- * initialize, tools/list, tools/call; ignores notifications. stdout carries protocol
- * messages only; logs go to stderr.
+ * The implementation is removed and stdio MCP delivery is disabled in the sidecar until the
+ * security issues are fixed. Nothing launches this server anymore (`tools/mcp-bridge.ts`
+ * `buildToolMcpServers` throws `MCP_UNSUPPORTED_MESSAGE` instead of spawning it, and
+ * `run-plan.ts` refuses any run carrying a stdio MCP server). If invoked directly it refuses
+ * loudly rather than serving.
  */
-import { randomUUID } from "node:crypto";
-
-import type { ResolvedToolSpec } from "../protocol.ts";
-import { EMPTY_OBJECT_SCHEMA } from "./callback.ts";
-import { runResolvedTool } from "./dispatch.ts";
-
-const SPECS: ResolvedToolSpec[] = JSON.parse(process.env.AGENTA_TOOL_PUBLIC_SPECS ?? "[]");
-const RELAY_DIR = process.env.AGENTA_TOOL_RELAY_DIR;
-const SPEC_BY_NAME = new Map(SPECS.map((s) => [s.name, s]));
-const DEFAULT_PROTOCOL = "2025-06-18";
-
-function log(message: string): void {
-  process.stderr.write(`[tool-bridge] ${message}\n`);
-}
-
-function send(message: unknown): void {
-  process.stdout.write(`${JSON.stringify(message)}\n`);
-}
-
-async function handle(message: any): Promise<unknown | undefined> {
-  const { id, method, params } = message ?? {};
-
-  // Notifications (no id) need no response.
-  if (id === undefined || id === null) {
-    return undefined;
-  }
-
-  if (method === "initialize") {
-    return {
-      jsonrpc: "2.0",
-      id,
-      result: {
-        protocolVersion: params?.protocolVersion ?? DEFAULT_PROTOCOL,
-        capabilities: { tools: {} },
-        serverInfo: { name: "agenta-tools", version: "0.1.0" },
-      },
-    };
-  }
-
-  if (method === "tools/list") {
-    return {
-      jsonrpc: "2.0",
-      id,
-      result: {
-        // `client` tools are browser-fulfilled, so this server does not advertise them.
-        tools: SPECS.filter((s) => s.kind !== "client").map((s) => ({
-          name: s.name,
-          description: s.description ?? s.name,
-          inputSchema: (s.inputSchema as Record<string, unknown>) ?? EMPTY_OBJECT_SCHEMA,
-        })),
-      },
-    };
-  }
-
-  if (method === "tools/call") {
-    const name = params?.name;
-    const spec = SPEC_BY_NAME.get(name);
-    if (!spec) {
-      return { jsonrpc: "2.0", id, error: { code: -32602, message: `unknown tool: ${name}` } };
-    }
-    try {
-      if (!RELAY_DIR) throw new Error("missing AGENTA_TOOL_RELAY_DIR");
-      // The bridge only has public metadata. A unique id per call keeps parallel calls from
-      // colliding while the runner maps the tool name back to its private resolved spec.
-      const text = await runResolvedTool(spec, params?.arguments, {
-        toolCallId: randomUUID(),
-        relayDir: RELAY_DIR,
-      });
-      return { jsonrpc: "2.0", id, result: { content: [{ type: "text", text }] } };
-    } catch (err) {
-      // Surface as an MCP tool error (isError) so the model can recover, not a crash.
-      return {
-        jsonrpc: "2.0",
-        id,
-        result: {
-          content: [{ type: "text", text: err instanceof Error ? err.message : String(err) }],
-          isError: true,
-        },
-      };
-    }
-  }
-
-  return { jsonrpc: "2.0", id, error: { code: -32601, message: `method not found: ${method}` } };
-}
-
-function main(): void {
-  log(`serving ${SPECS.length} tool(s) -> relay ${RELAY_DIR || "(missing)"}`);
-  let buffer = "";
-  process.stdin.setEncoding("utf8");
-  process.stdin.on("data", (chunk: string) => {
-    buffer += chunk;
-    let newline: number;
-    while ((newline = buffer.indexOf("\n")) !== -1) {
-      const line = buffer.slice(0, newline).trim();
-      buffer = buffer.slice(newline + 1);
-      if (!line) continue;
-      let parsed: any;
-      try {
-        parsed = JSON.parse(line);
-      } catch {
-        log(`skipping non-JSON line: ${line.slice(0, 120)}`);
-        continue;
-      }
-      Promise.resolve(handle(parsed))
-        .then((response) => {
-          if (response) send(response);
-        })
-        .catch((err) => log(`handler error: ${err?.message ?? err}`));
-    }
-  });
-  process.stdin.on("end", () => process.exit(0));
-}
+import { MCP_UNSUPPORTED_MESSAGE } from "./mcp-bridge.ts";
 
-main();
+process.stderr.write(`[tool-bridge] ${MCP_UNSUPPORTED_MESSAGE}\n`);
+process.exit(1);
diff --git a/services/agent/tests/unit/mcp-servers.test.ts b/services/agent/tests/unit/mcp-servers.test.ts
index 8b1529daae..7bb11f0785 100644
--- a/services/agent/tests/unit/mcp-servers.test.ts
+++ b/services/agent/tests/unit/mcp-servers.test.ts
@@ -1,11 +1,13 @@
 /**
- * Unit tests for the user-declared MCP server conversion (Agent B's Slice 4, wired in sandbox-agent).
+ * Unit tests for the user-declared MCP server conversion — stdio delivery now DISABLED.
  *
- * Agent B's `resolve_mcp_servers` emits the McpServerConfig wire shape
- * ({name,transport,command,args,env,url?,tools?}, env as a Record), pinned in the Python
- * test_wire_contract. This covers the TS half: converting that to the ACP stdio entry the
- * session consumes (env as a {name,value} list), skipping remote/http, and not enforcing the
- * per-server tools allowlist over ACP in v1.
+ * `resolve_mcp_servers` (Python) still emits the McpServerConfig wire shape and the wire shape
+ * is unchanged. The sidecar, however, no longer delivers stdio MCP servers: a stdio server runs
+ * an arbitrary process on the RUNNER HOST, outside the sandbox boundary, so the implementation
+ * is disabled (parity with the removed code execution) until its security is fixed. Converting a
+ * stdio server now throws `MCP_UNSUPPORTED_MESSAGE`; remote (`http`) and command-less stdio
+ * servers were never delivered over ACP and are still skipped, so an all-remote request stays a
+ * no-op.
  *
  * Run: pnpm test (or: pnpm exec vitest run tests/unit/mcp-servers.test.ts)
  */
@@ -13,15 +15,16 @@ import { describe, it } from "vitest";
 import assert from "node:assert/strict";
 
 import { toAcpMcpServers } from "../../src/engines/sandbox_agent.ts";
+import { MCP_UNSUPPORTED_MESSAGE } from "../../src/tools/mcp-bridge.ts";
 import type { McpServerConfig } from "../../src/protocol.ts";
 
-describe("toAcpMcpServers", () => {
+describe("toAcpMcpServers (stdio disabled)", () => {
   it("maps empty input to []", () => {
     assert.deepEqual(toAcpMcpServers(undefined), [], "undefined -> []");
     assert.deepEqual(toAcpMcpServers([]), [], "[] -> []");
   });
 
-  it("converts a stdio server's env Record to an ACP {name,value} list", () => {
+  it("throws the unsupported error for a stdio server", () => {
     const servers: McpServerConfig[] = [
       {
         name: "github",
@@ -29,30 +32,23 @@ describe("toAcpMcpServers", () => {
         command: "npx",
         args: ["-y", "@modelcontextprotocol/server-github"],
         env: { GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_x", LOG_LEVEL: "info" },
-        tools: ["create_issue"], // allowlist not enforced over ACP v1 (logged), server still delivered
       },
     ];
-    const out = toAcpMcpServers(servers);
-    assert.equal(out.length, 1);
-    assert.equal(out[0].name, "github");
-    assert.equal(out[0].command, "npx");
-    assert.deepEqual(out[0].args, ["-y", "@modelcontextprotocol/server-github"]);
-    assert.deepEqual(out[0].env, [
-      { name: "GITHUB_PERSONAL_ACCESS_TOKEN", value: "ghp_x" },
-      { name: "LOG_LEVEL", value: "info" },
-    ]);
+    assert.throws(
+      () => toAcpMcpServers(servers),
+      new RegExp(MCP_UNSUPPORTED_MESSAGE),
+    );
   });
 
-  it("skips remote/http and command-less stdio servers", () => {
+  it("skips remote/http and command-less stdio servers without throwing", () => {
     const out = toAcpMcpServers([
       { name: "remote", transport: "http", url: "https://example.com/mcp" },
-      { name: "broken", transport: "stdio" }, // no command
+      { name: "broken", transport: "stdio" }, // no command -> never launched, skipped
     ]);
-    assert.deepEqual(out, [], "http + command-less stdio both skipped");
-  });
-
-  it("defaults missing env / args to empty", () => {
-    const out = toAcpMcpServers([{ name: "fs", transport: "stdio", command: "mcp-fs" }]);
-    assert.deepEqual(out, [{ name: "fs", command: "mcp-fs", args: [], env: [] }]);
+    assert.deepEqual(
+      out,
+      [],
+      "http + command-less stdio both skipped, no throw",
+    );
   });
 });
diff --git a/services/agent/tests/unit/sandbox-agent-orchestration.test.ts b/services/agent/tests/unit/sandbox-agent-orchestration.test.ts
index 106e109768..440b65c545 100644
--- a/services/agent/tests/unit/sandbox-agent-orchestration.test.ts
+++ b/services/agent/tests/unit/sandbox-agent-orchestration.test.ts
@@ -292,9 +292,12 @@ describe("runSandboxAgent orchestration", () => {
   it("starts and stops the tool relay only when executable tools are present", async () => {
     const { calls, deps } = fakeHarness();
 
+    // Pi delivers tools through its native extension (not the MCP bridge), so the relay path
+    // is exercised on a Pi run. The MCP bridge is disabled in the sidecar (see the dedicated
+    // test below), so a non-Pi harness can no longer take custom tools at all.
     const result = await runSandboxAgent(
       {
-        harness: "claude",
+        harness: "pi_core",
         prompt: "use the tool",
         customTools: [{ name: "server_tool", kind: "callback" }],
       } as AgentRunRequest,
@@ -320,6 +323,30 @@ describe("runSandboxAgent orchestration", () => {
     );
   });
 
+  it("fails a non-Pi run carrying custom tools because the MCP bridge is disabled", async () => {
+    // Claude takes tools only over MCP, and the sidecar's stdio MCP bridge is disabled until
+    // its security is fixed (parity with the removed code execution). So a Claude run with a
+    // custom tool now surfaces the not-supported error instead of silently dropping or
+    // unconfined-executing the tool.
+    const { deps } = fakeHarness();
+
+    const result = await runSandboxAgent(
+      {
+        harness: "claude",
+        prompt: "use the tool",
+        customTools: [{ name: "server_tool", kind: "callback" }],
+      } as AgentRunRequest,
+      undefined,
+      undefined,
+      deps,
+    );
+
+    assert.deepEqual(result, {
+      ok: false,
+      error: "MCP servers are not supported by the sidecar.",
+    });
+  });
+
   it("flushes a partial trace and cleans up on prompt errors", async () => {
     const { calls, deps } = fakeHarness({ promptError: new Error("boom") });
 
@@ -444,7 +471,10 @@ describe("runSandboxAgent orchestration", () => {
         model: "claude-sonnet-4",
         deployment: "vertex_ai",
         credentialMode: "env",
-        secrets: { GOOGLE_CLOUD_PROJECT: "proj", GOOGLE_CLOUD_LOCATION: "us-central1" },
+        secrets: {
+          GOOGLE_CLOUD_PROJECT: "proj",
+          GOOGLE_CLOUD_LOCATION: "us-central1",
+        },
       } as AgentRunRequest,
       undefined,
       undefined,
diff --git a/services/agent/tests/unit/sandbox-agent-run-plan.test.ts b/services/agent/tests/unit/sandbox-agent-run-plan.test.ts
index 5f8fd28f6c..d9c7d28fc8 100644
--- a/services/agent/tests/unit/sandbox-agent-run-plan.test.ts
+++ b/services/agent/tests/unit/sandbox-agent-run-plan.test.ts
@@ -131,7 +131,9 @@ describe("buildRunPlan", () => {
     assert.equal(result.plan.sandboxPermission, undefined);
   });
 
-  it("rejects a strict restricted-network run on the local sandbox", () => {
+  it("errors on a restricted-network run on the local sandbox under strict", () => {
+    // Not enforceable on the local sidecar, so it errors the not-implemented way regardless of
+    // enforcement (mirrors the code-tool gate).
     const result = buildRunPlan(
       {
         harness: "claude",
@@ -147,10 +149,12 @@ describe("buildRunPlan", () => {
 
     assert.equal(result.ok, false);
     if (result.ok) return;
-    assert.match(result.error, /local sandbox cannot enforce network:off/);
+    assert.match(result.error, /not enforceable on the local sandbox/);
   });
 
-  it("allows a best_effort restricted-network run on the local sandbox", () => {
+  it("errors on a restricted-network run on the local sandbox even under best_effort", () => {
+    // best_effort is no longer the escape hatch: the local sandbox genuinely cannot enforce
+    // egress, so a set network policy always errors there.
     const result = buildRunPlan(
       {
         harness: "claude",
@@ -164,9 +168,51 @@ describe("buildRunPlan", () => {
       { createLocalCwd: () => "/tmp/local-cwd" },
     );
 
+    assert.equal(result.ok, false);
+    if (result.ok) return;
+    assert.match(result.error, /not enforceable on the local sandbox/);
+  });
+
+  it("allows an unrestricted (network: on) run on the local sandbox", () => {
+    const result = buildRunPlan(
+      {
+        harness: "claude",
+        sandbox: "local",
+        prompt: "hello",
+        sandboxPermission: {
+          network: { mode: "on" },
+          enforcement: "strict",
+        },
+      } as AgentRunRequest,
+      { createLocalCwd: () => "/tmp/local-cwd" },
+    );
+
     assert.equal(result.ok, true);
   });
 
+  it("errors when filesystem is specified (not implemented on any backend)", () => {
+    // Declared on the wire but applied by no backend, so specifying it errors everywhere — even
+    // on Daytona, even under best_effort.
+    for (const enforcement of ["strict", "best_effort"] as const) {
+      const result = buildRunPlan(
+        {
+          harness: "claude",
+          sandbox: "daytona",
+          prompt: "hello",
+          sandboxPermission: { filesystem: "readonly", enforcement },
+        } as AgentRunRequest,
+        { createDaytonaCwd: () => "/home/sandbox/agenta-fixed" },
+      );
+
+      assert.equal(result.ok, false);
+      if (result.ok) return;
+      assert.match(
+        result.error,
+        /Filesystem sandbox policy is not implemented/,
+      );
+    }
+  });
+
   it("rejects a strict restricted-network Daytona run with a runner-host tool", () => {
     const result = buildRunPlan(
       {
@@ -188,24 +234,38 @@ describe("buildRunPlan", () => {
     assert.match(result.error, /network:allowlist/);
   });
 
-  it("rejects a strict restricted-network Daytona run with a stdio MCP server", () => {
+  it("errors on any run carrying a stdio MCP server (MCP disabled)", () => {
+    // The stdio MCP implementation is disabled in the sidecar; a stdio server errors the
+    // not-implemented way regardless of sandbox/enforcement, even with no network policy.
     const result = buildRunPlan(
       {
         harness: "claude",
         sandbox: "daytona",
         prompt: "hello",
         mcpServers: [{ name: "fs", transport: "stdio", command: "mcp-fs" }],
-        sandboxPermission: {
-          network: { mode: "off" },
-          enforcement: "strict",
-        },
       } as AgentRunRequest,
       { createDaytonaCwd: () => "/home/sandbox/agenta-fixed" },
     );
 
     assert.equal(result.ok, false);
     if (result.ok) return;
-    assert.match(result.error, /stdio MCP servers run on the runner host/);
+    assert.match(result.error, /MCP servers are not supported by the sidecar/);
+  });
+
+  it("errors on a stdio MCP server on the local sandbox too", () => {
+    const result = buildRunPlan(
+      {
+        harness: "pi_core",
+        sandbox: "local",
+        prompt: "hello",
+        mcpServers: [{ name: "fs", transport: "stdio", command: "mcp-fs" }],
+      } as AgentRunRequest,
+      { createLocalCwd: () => "/tmp/local-cwd" },
+    );
+
+    assert.equal(result.ok, false);
+    if (result.ok) return;
+    assert.match(result.error, /MCP servers are not supported by the sidecar/);
   });
 
   it("allows a strict restricted-network Daytona run with only a remote MCP server", () => {
diff --git a/services/agent/tests/unit/server.test.ts b/services/agent/tests/unit/server.test.ts
index 6aa21fb539..13dafc1f34 100644
--- a/services/agent/tests/unit/server.test.ts
+++ b/services/agent/tests/unit/server.test.ts
@@ -7,13 +7,23 @@
  *
  * Run: pnpm test (or: pnpm exec vitest run tests/unit/server.test.ts)
  */
-import { describe, it } from "vitest";
+import { afterEach, describe, it } from "vitest";
 import assert from "node:assert/strict";
 import type { AddressInfo } from "node:net";
 
 import { createAgentServer, type RunAgent } from "../../src/server.ts";
 
-async function listen(run: RunAgent): Promise<{ url: string; close: () => Promise<void> }> {
+const TOKEN_ENV = "AGENTA_AGENT_RUNNER_TOKEN";
+const previousToken = process.env[TOKEN_ENV];
+
+afterEach(() => {
+  if (previousToken === undefined) delete process.env[TOKEN_ENV];
+  else process.env[TOKEN_ENV] = previousToken;
+});
+
+async function listen(
+  run: RunAgent,
+): Promise<{ url: string; close: () => Promise<void> }> {
   const server = createAgentServer(run);
   await new Promise<void>((resolve) => server.listen(0, "127.0.0.1", resolve));
   const { port } = server.address() as AddressInfo;
@@ -36,7 +46,8 @@ describe("createAgentServer", () => {
       assert.equal(typeof body.runner, "string");
       assert.equal(typeof body.protocol, "number");
       assert.ok(
-        Array.isArray(body.engines) && (body.engines as unknown[]).includes("sandbox-agent"),
+        Array.isArray(body.engines) &&
+          (body.engines as unknown[]).includes("sandbox-agent"),
       );
       assert.ok(Array.isArray(body.harnesses));
     } finally {
@@ -47,7 +58,10 @@ describe("createAgentServer", () => {
   it("POST /run returns the engine result (200)", async () => {
     const s = await listen(okRun);
     try {
-      const res = await fetch(`${s.url}/run`, { method: "POST", body: JSON.stringify({ harness: "pi_core" }) });
+      const res = await fetch(`${s.url}/run`, {
+        method: "POST",
+        body: JSON.stringify({ harness: "pi_core" }),
+      });
       assert.equal(res.status, 200);
       const body = (await res.json()) as { ok: boolean; output: string };
       assert.equal(body.ok, true);
@@ -60,7 +74,10 @@ describe("createAgentServer", () => {
   it("POST /run with invalid JSON returns 400", async () => {
     const s = await listen(okRun);
     try {
-      const res = await fetch(`${s.url}/run`, { method: "POST", body: "{not json" });
+      const res = await fetch(`${s.url}/run`, {
+        method: "POST",
+        body: "{not json",
+      });
       assert.equal(res.status, 400);
       const body = (await res.json()) as { ok: boolean; error: string };
       assert.equal(body.ok, false);
@@ -84,11 +101,97 @@ describe("createAgentServer", () => {
     }
   });
 
+  it("POST /run is accepted with no token configured (default-off, network isolation only)", async () => {
+    delete process.env[TOKEN_ENV];
+    const s = await listen(okRun);
+    try {
+      const res = await fetch(`${s.url}/run`, { method: "POST", body: "{}" });
+      assert.equal(res.status, 200);
+    } finally {
+      await s.close();
+    }
+  });
+
+  it("POST /run without the token returns 401 when a token is configured", async () => {
+    process.env[TOKEN_ENV] = "s3cret";
+    const s = await listen(okRun);
+    try {
+      const res = await fetch(`${s.url}/run`, { method: "POST", body: "{}" });
+      assert.equal(res.status, 401);
+      const body = (await res.json()) as { ok: boolean; error: string };
+      assert.equal(body.ok, false);
+      assert.match(body.error, /Unauthorized/);
+    } finally {
+      await s.close();
+    }
+  });
+
+  it("POST /run with a wrong token returns 401", async () => {
+    process.env[TOKEN_ENV] = "s3cret";
+    const s = await listen(okRun);
+    try {
+      const res = await fetch(`${s.url}/run`, {
+        method: "POST",
+        headers: { authorization: "Bearer nope" },
+        body: "{}",
+      });
+      assert.equal(res.status, 401);
+    } finally {
+      await s.close();
+    }
+  });
+
+  it("POST /run accepts the matching token via Authorization: Bearer", async () => {
+    process.env[TOKEN_ENV] = "s3cret";
+    const s = await listen(okRun);
+    try {
+      const res = await fetch(`${s.url}/run`, {
+        method: "POST",
+        headers: { authorization: "Bearer s3cret" },
+        body: "{}",
+      });
+      assert.equal(res.status, 200);
+    } finally {
+      await s.close();
+    }
+  });
+
+  it("POST /run accepts the matching token via X-Agenta-Runner-Token", async () => {
+    process.env[TOKEN_ENV] = "s3cret";
+    const s = await listen(okRun);
+    try {
+      const res = await fetch(`${s.url}/run`, {
+        method: "POST",
+        headers: { "x-agenta-runner-token": "s3cret" },
+        body: "{}",
+      });
+      assert.equal(res.status, 200);
+    } finally {
+      await s.close();
+    }
+  });
+
+  it("GET /health is reachable without the token even when one is configured", async () => {
+    // Health is for liveness probes and carries no secrets, so the token gate is on /run only.
+    process.env[TOKEN_ENV] = "s3cret";
+    const s = await listen(okRun);
+    try {
+      const res = await fetch(`${s.url}/health`);
+      assert.equal(res.status, 200);
+    } finally {
+      await s.close();
+    }
+  });
+
   it("NDJSON stream: events first, then exactly one terminal result with no echoed events", async () => {
     const streamRun: RunAgent = async (_req, emit) => {
       emit?.({ type: "message", text: "a" });
       emit?.({ type: "message", text: "b" });
-      return { ok: true, output: "ab", events: [{ type: "message", text: "a" }] };
+      return {
+        ok: true,
+        output: "ab",
+        events: [{ type: "message", text: "a" }],
+      };
     };
     const s = await listen(streamRun);
     try {
@@ -101,9 +204,22 @@ describe("createAgentServer", () => {
       const records = (await res.text())
         .trim()
         .split("\n")
-        .map((line) => JSON.parse(line) as { kind: string; result?: { events: unknown[] } });
-      assert.deepEqual(records.map((r) => r.kind), ["event", "event", "result"]);
-      assert.deepEqual(records[2].result!.events, [], "terminal result does not echo events");
+        .map(
+          (line) =>
+            JSON.parse(line) as {
+              kind: string;
+              result?: { events: unknown[] };
+            },
+        );
+      assert.deepEqual(
+        records.map((r) => r.kind),
+        ["event", "event", "result"],
+      );
+      assert.deepEqual(
+        records[2].result!.events,
+        [],
+        "terminal result does not echo events",
+      );
     } finally {
       await s.close();
     }
diff --git a/services/agent/tests/unit/tool-bridge.test.ts b/services/agent/tests/unit/tool-bridge.test.ts
index 0371633b20..6b57d58f5a 100644
--- a/services/agent/tests/unit/tool-bridge.test.ts
+++ b/services/agent/tests/unit/tool-bridge.test.ts
@@ -1,32 +1,28 @@
 /**
- * Unit tests for buildToolMcpServers (the tool MCP bridge attachment decision).
+ * Unit tests for buildToolMcpServers — the stdio tool MCP bridge, now DISABLED in the sidecar.
  *
- * Regression cover for F4: attachment must be decided per tool kind, not on the callback
- * endpoint alone. A `code` tool is still advertised through mcp-server.ts and needs no endpoint,
- * so a run whose tools are all `code` must still attach the `agenta-tools` server. Only `callback`-kind
- * tools require AGENTA_TOOL_CALLBACK_ENDPOINT; missing it must degrade those tools, not drop the
- * whole server. `client` tools are browser-fulfilled and never justify attaching the bridge.
+ * The bridge used to launch a stdio MCP child process on the runner host to expose resolved
+ * tools to non-Pi harnesses. That process ran outside the sandbox boundary (the same
+ * runner-host execution bypass that had code execution removed), so the implementation is
+ * disabled until its security is fixed. The interface/types remain, but delivering any
+ * executable spec now throws `MCP_UNSUPPORTED_MESSAGE`. The no-tools path (empty or
+ * client-only specs) stays a no-op so non-tool runs are untouched.
  *
  * Run: pnpm test (or: pnpm exec vitest run tests/unit/tool-bridge.test.ts)
  */
 import { describe, it } from "vitest";
 import assert from "node:assert/strict";
 
-import { buildToolMcpServers } from "../../src/tools/mcp-bridge.ts";
-import type { ResolvedToolSpec, ToolCallbackContext } from "../../src/protocol.ts";
-
-/** Look up an env var value by name in the ACP {name,value} list (undefined if absent). */
-function envValue(
-  env: { name: string; value: string }[],
-  name: string,
-): string | undefined {
-  return env.find((e) => e.name === name)?.value;
-}
+import {
+  buildToolMcpServers,
+  MCP_UNSUPPORTED_MESSAGE,
+} from "../../src/tools/mcp-bridge.ts";
+import type { ResolvedToolSpec } from "../../src/protocol.ts";
 
 const relayDir = "/tmp/agenta-tools";
 
-describe("buildToolMcpServers", () => {
-  it("attaches the server for a code-only run, with public specs and relay dir", () => {
+describe("buildToolMcpServers (disabled)", () => {
+  it("throws the unsupported error for a code-only run", () => {
     const specs: ResolvedToolSpec[] = [
       {
         name: "adder",
@@ -37,96 +33,70 @@ describe("buildToolMcpServers", () => {
         env: { PRIVATE: "secret" },
       },
     ];
-    const out = buildToolMcpServers(specs, relayDir);
-    assert.equal(out.length, 1, "code-only run still attaches the server");
-    assert.equal(out[0].name, "agenta-tools");
-    assert.ok(
-      envValue(out[0].env, "AGENTA_TOOL_PUBLIC_SPECS") !== undefined,
-      "AGENTA_TOOL_PUBLIC_SPECS is set",
-    );
-    assert.equal(
-      envValue(out[0].env, "AGENTA_TOOL_CALLBACK_ENDPOINT"),
-      undefined,
-      "no endpoint env for code-only run",
+    assert.throws(
+      () => buildToolMcpServers(specs, relayDir),
+      new RegExp(MCP_UNSUPPORTED_MESSAGE),
     );
-    assert.equal(envValue(out[0].env, "AGENTA_TOOL_RELAY_DIR"), relayDir);
-    assert.equal(envValue(out[0].env, "AGENTA_TOOL_CALLBACK_AUTH"), undefined);
-    assert.equal(envValue(out[0].env, "AGENTA_TOOL_SPECS"), undefined);
-    // Only public metadata round-trips; private executor fields stay runner-side.
-    assert.deepEqual(JSON.parse(envValue(out[0].env, "AGENTA_TOOL_PUBLIC_SPECS")!), [
-      { name: "adder", description: "Add numbers" },
-    ]);
   });
 
-  it("never exposes endpoint/auth env to the bridge child (callback + full callback)", () => {
+  it("throws the unsupported error for a callback run", () => {
     const specs: ResolvedToolSpec[] = [
       { name: "search", kind: "callback", callRef: "composio.search" },
     ];
-    const callback: ToolCallbackContext = {
-      endpoint: "https://agenta.example/tools/call",
-      authorization: "Bearer tok",
-    };
-    const out = buildToolMcpServers(specs, callback, relayDir);
-    assert.equal(out.length, 1);
-    assert.equal(
-      envValue(out[0].env, "AGENTA_TOOL_CALLBACK_ENDPOINT"),
-      undefined,
-      "endpoint env is never exposed to the bridge",
-    );
-    assert.equal(
-      envValue(out[0].env, "AGENTA_TOOL_CALLBACK_AUTH"),
-      undefined,
-      "auth env is never exposed to the bridge",
+    assert.throws(
+      () =>
+        buildToolMcpServers(
+          specs,
+          {
+            endpoint: "https://agenta.example/tools/call",
+            authorization: "Bearer tok",
+          },
+          relayDir,
+        ),
+      new RegExp(MCP_UNSUPPORTED_MESSAGE),
     );
-    assert.equal(envValue(out[0].env, "AGENTA_TOOL_RELAY_DIR"), relayDir);
   });
 
-  it("omits AUTH env when authorization is absent (endpoint but no auth)", () => {
+  it("throws for an absent-kind (back-compat callback) run", () => {
     const specs: ResolvedToolSpec[] = [
-      { name: "search", kind: "callback", callRef: "composio.search" },
+      { name: "legacy", callRef: "composio.legacy" },
     ];
-    const out = buildToolMcpServers(specs, { endpoint: "https://agenta.example/tools/call" }, relayDir);
-    assert.equal(out.length, 1);
-    assert.equal(envValue(out[0].env, "AGENTA_TOOL_CALLBACK_ENDPOINT"), undefined);
-    assert.equal(
-      envValue(out[0].env, "AGENTA_TOOL_CALLBACK_AUTH"),
-      undefined,
-      "no AUTH env when authorization absent",
+    assert.throws(
+      () =>
+        buildToolMcpServers(
+          specs,
+          { endpoint: "https://agenta.example/tools/call" },
+          relayDir,
+        ),
+      new RegExp(MCP_UNSUPPORTED_MESSAGE),
     );
   });
 
-  it("treats an absent kind as callback (back-compat)", () => {
-    const specs: ResolvedToolSpec[] = [{ name: "legacy", callRef: "composio.legacy" }];
-    const out = buildToolMcpServers(specs, { endpoint: "https://agenta.example/tools/call" }, relayDir);
-    assert.equal(out.length, 1, "back-compat (no kind) attaches as a callback tool");
-    assert.equal(envValue(out[0].env, "AGENTA_TOOL_CALLBACK_ENDPOINT"), undefined);
-  });
-
-  it("attaches one server for a mixed code+callback run with no endpoint", () => {
+  it("throws for a mixed code+callback run", () => {
     const specs: ResolvedToolSpec[] = [
-      { name: "adder", kind: "code", runtime: "python", code: "def main(**k): return 1" },
+      {
+        name: "adder",
+        kind: "code",
+        runtime: "python",
+        code: "def main(**k): return 1",
+      },
       { name: "search", kind: "callback", callRef: "composio.search" },
     ];
-    const out = buildToolMcpServers(specs, relayDir);
-    assert.notDeepEqual(out, [], "mixed run with no endpoint must not return []");
-    assert.equal(out.length, 1, "still attaches the server so the code tool is advertised");
-    assert.equal(
-      envValue(out[0].env, "AGENTA_TOOL_CALLBACK_ENDPOINT"),
-      undefined,
-      "endpoint env omitted when missing",
+    assert.throws(
+      () => buildToolMcpServers(specs, relayDir),
+      new RegExp(MCP_UNSUPPORTED_MESSAGE),
     );
-    // Both executable specs are advertised, but only as public metadata.
-    assert.deepEqual(JSON.parse(envValue(out[0].env, "AGENTA_TOOL_PUBLIC_SPECS")!), [
-      { name: "adder" },
-      { name: "search" },
-    ]);
   });
 
-  it("returns [] for empty specs", () => {
-    assert.deepEqual(buildToolMcpServers([], undefined), [], "empty specs -> []");
+  it("returns [] for empty specs (no-tools path untouched)", () => {
+    assert.deepEqual(
+      buildToolMcpServers([], undefined),
+      [],
+      "empty specs -> []",
+    );
   });
 
-  it("returns [] for client-only specs (nothing executable, even with an endpoint)", () => {
+  it("returns [] for client-only specs (nothing executable, never goes through the bridge)", () => {
     const specs: ResolvedToolSpec[] = [{ name: "confirm", kind: "client" }];
     assert.deepEqual(
       buildToolMcpServers(specs, undefined),
@@ -134,24 +104,29 @@ describe("buildToolMcpServers", () => {
       "client-only -> [] (nothing executable here)",
     );
     assert.deepEqual(
-      buildToolMcpServers(specs, { endpoint: "https://agenta.example/tools/call" }, relayDir),
+      buildToolMcpServers(
+        specs,
+        { endpoint: "https://agenta.example/tools/call" },
+        relayDir,
+      ),
       [],
       "client-only -> [] even with an endpoint",
     );
   });
 
-  it("drops client tools from the advertised list but still attaches for an executable sibling", () => {
+  it("throws when an executable spec sits beside a client spec", () => {
     const specs: ResolvedToolSpec[] = [
       { name: "confirm", kind: "client" },
-      { name: "adder", kind: "code", runtime: "python", code: "def main(**k): return 1" },
+      {
+        name: "adder",
+        kind: "code",
+        runtime: "python",
+        code: "def main(**k): return 1",
+      },
     ];
-    const out = buildToolMcpServers(specs, relayDir);
-    assert.equal(out.length, 1, "executable spec attaches the server");
-    const passed: ResolvedToolSpec[] = JSON.parse(envValue(out[0].env, "AGENTA_TOOL_PUBLIC_SPECS")!);
-    assert.deepEqual(
-      passed.map((s) => s.name),
-      ["adder"],
-      "client spec excluded from the executable list passed to the bridge",
+    assert.throws(
+      () => buildToolMcpServers(specs, relayDir),
+      new RegExp(MCP_UNSUPPORTED_MESSAGE),
     );
   });
 });