diff --git a/docs/design/agent-workflows/projects/wire-contract-schema/README.md b/docs/design/agent-workflows/projects/wire-contract-schema/README.md
new file mode 100644
index 0000000000..262ed168eb
--- /dev/null
+++ b/docs/design/agent-workflows/projects/wire-contract-schema/README.md
@@ -0,0 +1,589 @@
+# Project: A schema-driven `/run` contract
+
+| | |
+| --- | --- |
+| **Status** | Plan. Revised per author PR review on #4830 (2026-06-24). Pre-production POC — any wire shape may change freely; no back-compat burden. |
+| **Type** | Engineering project (a sequenced, test-driven change), not a one-shot change. |
+| **Scope** | Replace the hand-mirrored `/run` wire contract with a single schema source (Pydantic for now); **ship the exported JSON interface in the SDK** and investigate whether Fern can see it; fold in a structured error model and a carried contract version. **No sidecar/runner validation yet** — the contract is still brittle. |
+| **Owner files (today)** | `services/agent/src/protocol.ts` (TS types), `sdks/python/agenta/sdk/agents/utils/wire.py` (Python mirror), `sdks/python/oss/tests/pytest/unit/agents/golden/` (fixtures), `sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py` + `services/agent/tests/unit/wire-contract.test.ts` (the two contract tests). |
+| **Reference** | The deep spec of the contract as built: [`../runner-interface/README.md`](../runner-interface/README.md). Its Section 12 ("Known gaps") names the exact gaps this project closes. The inventory page: [`../../interfaces/cross-service/service-to-agent-runner.md`](../../interfaces/cross-service/service-to-agent-runner.md). |
+| **Mirroring rule today** | `services/agent/CLAUDE.md` ("The wire contract is mirrored — change both sides"). |
+
+## 1. The problem, precisely
+
+The `/run` contract is the spine of the agent stack: the Python agent service builds a request,
+the Node runner executes a turn, and returns a result or a stream of events. The contract is
+**defined twice** and kept in sync **by hand**:
+
+- TypeScript: `services/agent/src/protocol.ts` declares `AgentRunRequest`, `AgentRunResult`,
+  the `AgentEvent` union, `HarnessCapabilities`, and the sub-objects (`ResolvedToolSpec`,
+  `ToolCallbackContext`, `McpServerConfig`, `SandboxPermission`, `TraceContext`, `WireSkill`,
+  `ContentBlock`, `ChatMessage`, `AgentUsage`, `RenderHint`, `StreamRecord`).
+- Python: `sdks/python/agenta/sdk/agents/utils/wire.py` (`request_to_wire` / `result_from_wire`)
+  plus the BaseModels in `sdks/python/agenta/sdk/agents/dtos.py` (`Message`, `AgentEvent`,
+  `AgentResult`, `HarnessCapabilities`, `TraceContext`, `SandboxPermission`, ...) re-create the
+  same field names by hand.
+
+The **only** guard against the two drifting is four golden fixtures
+(`golden/run_request.{pi,claude}.json`, `golden/run_result.{ok,error}.json`) asserted by two
+tests. The TS test adds a compile-time key guard (`KNOWN_REQUEST_KEYS` assigned to
+`(keyof AgentRunRequest)[]`), and the Python test holds a parallel `KNOWN_REQUEST_KEYS` set.
+
+This is brittle for concrete, observed reasons:
+
+1. **Two hand-kept key lists.** `KNOWN_REQUEST_KEYS` is duplicated in
+   `test_wire_contract.py` and `wire-contract.test.ts`. A new field means editing five places
+   (golden, `protocol.ts`, `wire.py`, both key lists) "deliberately", per the CLAUDE.md rule.
+2. **No runtime validation at the boundary.** `POST /run` JSON-parses the body and runs with
+   whatever fields are present; an empty body becomes `{}` (`server.ts`). A malformed or
+   misspelled field is silently ignored, not rejected. The contract is *implicitly all-optional*
+   (every TS field is `?`, every Python field defaults). A typo like `sandboxPermision` is
+   dropped on the floor with no error. This is `runner-interface/README.md` §12 gap
+   "No schema validation on the runner". (Observed gap — but **not** fixed in this POC phase; a
+   boundary guard is a deferred follow-up, Section 8.)
+3. **The version skew guard is exposed but unconsumed.** `version.ts` exports
+   `PROTOCOL_VERSION = 1` and `/health` returns it, but **no Python caller probes `/health`**
+   (verified: no reference to `runnerInfo`/`PROTOCOL_VERSION`/`/health` in the runner-calling
+   path). A client and runner can silently disagree across a major bump. §12 gap "The version
+   skew guard is not consumed".
+4. **The error model is a free string.** `AgentRunResult.error?: string` with no taxonomy and
+   no machine-readable code; `result_from_wire` turns any `ok:false` into a generic
+   `RuntimeError(f"Agent run failed: {error}")`. There is **no distinct cancelled outcome** —
+   a user/client abort surfaces (if at all) as a transport teardown or a generic error, not as a
+   first-class result. §12 names neither, but the user has scoped this as the A10 cleanup.
+
+The fix for now is a **single source of truth** (Pydantic wire models) whose **JSON interface
+ships in the SDK**, plus the A10 error model and a `/capabilities` probe — sequenced so each step
+is a small change with a test that proves it. Boundary validation, generated TS types, and
+versioning are **deferred** (the contract is still brittle; this is a pre-production POC). The
+A3 rename (backend removal + `pi`->`pi_core` / `agenta`->`pi_agenta`) has already landed in the
+working tree, so the wire models describe that current shape from the start.
+
+## 2. What this project changes vs leaves alone
+
+**In scope:**
+
+- One schema as the source of truth for the `/run` request, result, event union, capabilities,
+  and the sub-objects listed above. **Source = Pydantic for now** (Section 4).
+- **The exported JSON Schema interface lives in the SDK** (alongside the existing `CATALOG_TYPES`
+  JSON interfaces), and an investigation of whether Fern can see/generate it across languages
+  (Section 4).
+- A structured error object `{ code, message, retryable }` and a distinct `cancelled` outcome.
+- A contract version carried in the payload (not only on `/health`), and a probe that consumes
+  it.
+- A decision on splitting `/run` (verdict: keep `/run` unified; promote a `/capabilities` probe).
+- Replacing the four golden fixtures + two key lists with schema-derived checks **on the Python
+  side** (the golden fixtures stay as *examples that must validate*, not as the only guard).
+
+**Deliberately NOT in scope for now (the contract is still brittle):**
+
+- **No request validation in the runner** (`server.ts` / `cli.ts`). We do not gate `/run` on the
+  schema yet. The runner keeps parsing the body as it does today.
+- **No use of the schema in the sidecar/runner at all.** No ajv, no new runner dependency, no
+  runtime validation step on the Node side. The schema is an SDK-side artifact for now.
+- These are deferred until the contract stabilizes; revisit when we want a hard boundary guard.
+
+**Explicitly unchanged by this work** (called out so reviewers do not expect movement):
+
+- **Composio, the tool gateway, connections, and MCP** all continue to work as today. They are
+  inputs the service already resolves: `customTools` (gateway callback / code / client),
+  `toolCallback`, `mcpServers`, `connection` / `provider` / `endpoint` / `credentialMode`. This
+  project re-expresses their **shapes** in a schema; it does not change how any of them resolve,
+  route, or authenticate. The Composio key stays server-side; the gateway callback still POSTs to
+  `/tools/call`; the MCP `stdio`/`http` shapes are unchanged. (We may still adjust any of these
+  shapes if the schema work surfaces a better one — this is a POC, not a frozen contract.)
+- The transports (HTTP + subprocess CLI) and the two modes (one-shot JSON + NDJSON streaming).
+- The harness shaping logic (`config.wire_tools()` etc.) — the schema describes the *output*
+  of that shaping, it does not move the shaping.
+- Tracing (`trace` / `TraceContext`) and the trace-export boundary.
+
+## 3. Current contract surface (assessment)
+
+The contract has four families. This is what a single schema has to cover.
+
+### 3.1 Request (`AgentRunRequest`)
+
+~30 top-level fields, **all optional on the wire**, grouped by job:
+
+| Group | Fields |
+| --- | --- |
+| engine + placement | `backend`, `harness`, `sandbox`, `sessionId` |
+| instructions | `agentsMd`, `systemPrompt`, `appendSystemPrompt` |
+| model + connection | `model`, `provider`, `connection {mode, slug?}`, `deployment`, `endpoint {baseUrl?, apiVersion?, region?, headers?}`, `credentialMode`, `secrets` |
+| turn | `prompt`, `messages` (`ChatMessage[]`) |
+| tools + skills | `tools` (string[]), `customTools` (`ResolvedToolSpec[]`), `toolCallback`, `mcpServers`, `skills` (`WireSkill[]`) |
+| policy + files | `permissionPolicy`, `sandboxPermission`, `harnessFiles` (`[{path, content}]`) |
+| tracing | `trace` (`TraceContext`) |
+
+Shape notes (the current serializer behavior, **not** a back-compat constraint — this is a
+pre-production POC and any of these may change freely): a plain-string `model` keeps `provider` /
+`connection` / `deployment` / `endpoint` / `credentialMode` off the wire; `mcpServers`, `skills`,
+`sandboxPermission`, `harnessFiles` are omitted (not null) when empty. The schema describes
+whatever shape we settle on; it does not exist to freeze today's bytes.
+
+### 3.2 Result (`AgentRunResult`)
+
+`ok` (bool), `output?`, `messages?`, `events?`, `usage?` (`AgentUsage`), `stopReason?`,
+`capabilities?` (`HarnessCapabilities`), `sessionId?`, `model?`, `traceId?`, `error?` (the free
+string this project replaces). `ok:false` raises in Python (`result_from_wire`).
+
+### 3.3 The event union (`AgentEvent`)
+
+A discriminated union on `type`: `message`, `thought`, the `message_*` / `reasoning_*` lifecycle
+trios, `tool_call`, `tool_result`, `interaction_request`, `data`, `file`, `usage`, `error`,
+`done`. Plus `StreamRecord = {kind:"event",event} | {kind:"result",result}` for NDJSON framing.
+Note: the Python side intentionally **drops unknown event types** on parse
+(`AgentEvent.from_wire` returns `None` for a typeless event), and a golden pins that. The schema
+must keep events **open/forward-compatible**, not closed.
+
+### 3.4 Sub-objects
+
+`ResolvedToolSpec` (the three-axis tool surface: `kind`/`runtime`/`code`/`env`/`callRef`,
+`needsApproval`, `render`, `readOnly`, `permission`), `ToolCallbackContext`, `McpServerConfig`,
+`SandboxPermission` (nested `network`, `filesystem`, `enforcement`), `HarnessCapabilities` (11
+boolean flags), `TraceContext`, `WireSkill` + `WireSkillFile`, `ContentBlock`, `ChatMessage`,
+`AgentUsage`, `RenderHint`.
+
+### 3.5 The existing golden/test machinery
+
+- `golden/run_request.pi.json` (full Pi shape: tools, skills, sandboxPermission, prompt overrides),
+  `golden/run_request.claude.json` (Claude shape: empty `tools`, `permissionPolicy:"deny"`,
+  `harnessFiles` with rendered `.claude/settings.json`).
+- `golden/run_result.ok.json` (includes a typeless event to pin the drop behavior),
+  `golden/run_result.error.json` (`{"ok": false, "error": "model exploded"}`).
+- Python `test_wire_contract.py`: builds payloads via the real configs and asserts `== golden`,
+  plus `set(payload) <= KNOWN_REQUEST_KEYS`.
+- TS `wire-contract.test.ts`: loads the goldens, asserts shapes through the runner helpers
+  (`resolvePromptText`, `messageText`, `resolveRunSessionId`), and the two compile-time guards
+  (`KNOWN_REQUEST_KEYS` / `CAPABILITY_KEYS` assigned to `keyof` types).
+
+The machinery is **good** and we keep its spirit: the goldens become "examples that must validate
+against the schema", and the duplicated key lists are replaced by schema-derived assertions.
+
+## 4. Design options for a single source of truth
+
+Three candidates, judged against this stack: **Python Pydantic 2 SDK** + a **standalone Node ESM
+runner package** (`services/agent`) that has its own `pnpm-lock.yaml`, runs through `tsx` with
+**no app compile step and no codegen toolchain today**, and is deliberately decoupled from the
+`web/` dependency graph. There is no JSON-Schema codegen, no `quicktype`,
+no `datamodel-code-generator`, and **no zod** anywhere in the runner or web (verified).
+
+### Option A — JSON Schema as source, codegen both sides
+
+Author the contract as hand-written JSON Schema files; generate TS types
+(`json-schema-to-typescript`) and Pydantic models (`datamodel-code-generator`) from them.
+
+- **Pros:** language-neutral source; one artifact; both sides are generated, so neither drifts.
+- **Cons:** introduces **two new codegen toolchains** into a repo that has none for this, and a
+  build step into a package that intentionally has none (`services/agent/CLAUDE.md`: "no app
+  compile step"). Hand-writing JSON Schema is verbose and error-prone for a union as rich as
+  `AgentEvent` + `RenderHint`. The Python SDK already has hand-written BaseModels with custom
+  `to_wire`/`from_wire` (camelCase aliasing, the `model`-string split, the drop-unknown-event
+  behavior); regenerating them from schema would either lose that behavior or require post-gen
+  patching. High blast radius, fights the existing grain. Also: it does not put the interface in
+  the SDK the way the existing `CATALOG_TYPES` Pydantic-derived schemas already are (Section 4.1).
+
+### Option B — Pydantic as source, export the JSON interface into the SDK (RECOMMENDED)
+
+Make **Python Pydantic models the source of truth** — but a **dedicated set of *wire* models**,
+NOT the existing semantic DTOs. This distinction is load-bearing (it was the sharpest review
+finding): the real contract today does not live in `dtos.py`'s classes — it lives in the **hand
+serializers** (`request_to_wire` builds a raw dict; `Message.to_wire`, `TraceContext.to_wire`,
+`AgentEvent.from_wire`, etc. do the camelCase + omit + drop-unknown work). The semantic DTOs use
+**snake_case** fields (`text_messages`, `mime_type`, `capture_content`) and an intentionally
+loose `AgentEvent` (`type: str` + free `data` dict, vs the real discriminated union in
+`protocol.ts`). Exporting `model_json_schema()` straight off those DTOs would produce the *wrong*
+schema (snake_case keys, a non-discriminated event). So:
+
+- Author new wire models in the SDK (e.g. `agents/wire_models.py`): `WireRunRequest`,
+  `WireRunResult`, and an **explicit discriminated `WireAgentEvent` union** (real variants on
+  `type`, plus an open fallback variant so unknown event types still validate, matching the
+  current drop-unknown tolerance) — with camelCase aliases (`populate_by_name=True`, as
+  `AgentConfig` already does), explicit nullability, and the exact field set the serializers emit.
+- These wire models become the single producer: `request_to_wire` / `result_from_wire` are
+  reimplemented in terms of them. The omit-when-empty behavior stays as serializer logic + golden
+  checks — `model_json_schema()` expresses "optional", not "omit when empty".
+- Pydantic 2's `model_json_schema()` exports the JSON Schema artifact **for free**, no new
+  toolchain. **This exported JSON interface ships in the SDK** — exactly the way the SDK already
+  exposes Pydantic-derived JSON Schemas through `CATALOG_TYPES` (Section 4.1). The immediate goal
+  is that the interface (the JSON) lives in the SDK; the runner does **not** consume it yet
+  (Section 5).
+
+- **Pros:** fits the stack — Pydantic 2 is already the SDK's modeling layer (`pydantic>=2,<3`);
+  the producer (Python) is the natural source since it builds the request. Schema export is a
+  built-in, not a new tool. It puts the interface in the SDK alongside the existing
+  `CATALOG_TYPES` JSON interfaces (one consistent mechanism). The omit-when-empty behavior stays
+  in Python where it already lives and is tested. The exported schema becomes a **CI-checked
+  artifact**: a test fails if the committed schema drifts from the wire models.
+- **Cons:** requires writing dedicated wire models (a real cost, but it is the honest cost of a
+  single source — the alternative is the current double-maintenance). The TS `protocol.ts` stays
+  **hand-written for now** — we do *not* generate it from the schema yet, because the runner does
+  not consume the schema yet (Section 5) and the contract is still brittle. Keeping the schema and
+  `protocol.ts` aligned stays a Python-side discipline for the moment (the Python goldens are the
+  guard). Generating `protocol.ts` from the schema is a later option once the contract settles.
+
+#### 4.1 The interface in the SDK, and whether Fern can see it
+
+The author's direction: get this interface (the JSON Schema) **into the SDK** now, and find out
+whether **Fern** can also see/generate it across languages. Findings, with concrete paths:
+
+- **The SDK already exposes Pydantic-derived JSON interfaces.** `CATALOG_TYPES` in
+  `sdks/python/agenta/sdk/utils/types.py` (line ~1265) is a dict of
+  `model_json_schema()` outputs for `Message`, `Messages`, `AgentConfigSchema`,
+  `SkillConfigSchema`, `PromptTemplate`, etc., each dereferenced. The agent workflow surfaces
+  them through `/inspect` via thin `x-ag-type-ref` markers (`services/oss/src/agent/schemas.py`),
+  and the playground resolves them against `GET /workflows/catalog/types/{type}`. **The wire
+  contract should ship the same way:** add the exported `WireRunRequest` / `WireRunResult` JSON
+  Schema next to `CATALOG_TYPES` (or as a sibling export) so the SDK is the single home of the
+  JSON interface. This is the immediate, low-risk goal.
+
+- **How Fern is used here.** Fern in this repo generates the multi-language API clients (Python +
+  TypeScript) under `clients/` and `web/packages/agenta-api-client/`. The pipeline
+  (`clients/scripts/generate.sh`) is: the FastAPI app (Pydantic models) emits **`/api/openapi.json`**
+  → the script writes an ephemeral `fern.config.json` + `generators.yml` and runs the
+  `fernapi/fern-python-sdk` and `fernapi/fern-typescript-sdk` generators against that OpenAPI
+  spec. There is **no `.fern/` API-definition directory checked in** and no Fern IDL; Fern's only
+  input is the generated OpenAPI document. So the chain is **Pydantic → OpenAPI → Fern → SDKs**.
+
+- **Can Fern see this interface? Yes, but only via OpenAPI — with one real caveat.** Fern reads
+  the OpenAPI spec, and that spec is built from the FastAPI/Pydantic models the *public API*
+  exposes. The `/run` contract is the **service ↔ runner spine**, not a public FastAPI endpoint,
+  so it does **not** appear in `openapi.json` today and Fern therefore cannot see it as-is. Two
+  ways to make Fern see it, neither needed for the immediate goal:
+  - **(a) Reference the wire models from a FastAPI surface.** If any endpoint (even an internal or
+    `/inspect`-style descriptor) types a field with the wire Pydantic models, FastAPI emits their
+    JSON Schema into `components/schemas` of `openapi.json`, and Fern then generates them in every
+    client language. This is the same path `AgentConfigSchema` already takes to reach the clients.
+  - **(b) Add a standalone OpenAPI fragment as a second Fern spec.** `generators.yml` takes a list
+    under `api.specs`; a hand-authored fragment that `$ref`s the exported `run-contract.schema.json`
+    could be added. Heavier and not worth it now.
+  - **Blocker / reason not to do it yet:** the contract is still brittle (it changes often as the
+    POC evolves), and putting it on the public OpenAPI surface would publish a moving target into
+    every generated client. So **for now**: export the JSON interface into the SDK (the
+    `CATALOG_TYPES`-style path), keep it out of the public OpenAPI spec, and let Fern pick it up
+    later once it stabilizes. The path is clear and there is no hard blocker — only a timing call.
+
+### Option C — A shared IDL (`.proto`, Smithy, etc.)
+
+Define the contract in a neutral IDL and generate both sides.
+
+- **Pros:** strongest neutrality; mature codegen.
+- **Cons:** the heaviest option for an internal JSON-over-HTTP/stdio boundary. The wire is JSON,
+  not protobuf; adopting proto means either proto-over-JSON (awkward) or changing the wire format
+  (out of scope and risky). Brings a build toolchain and a new language into a two-language repo
+  that wants fewer moving parts. The `AgentEvent` open-union + "drop unknown" semantics fit JSON
+  Schema's `additionalProperties`/`oneOf` better than proto's closed messages. Overkill.
+
+### Recommendation: Option B (Pydantic-as-source → exported JSON interface in the SDK)
+
+Use **Pydantic as the source for now**. It fits the Pydantic 2 stack, keeps the custom
+serialization semantics where they are tested, exports the JSON Schema for free, and **puts the
+interface in the SDK** the same way `CATALOG_TYPES` already does — which is exactly the immediate
+goal. Source of truth = dedicated Pydantic **wire** models (not the semantic DTOs); the exported
+schema ships in the SDK as a CI-checked artifact (a test fails if it drifts from the wire models).
+
+Two deliberate constraints from the author's review:
+
+- **No runner/sidecar validation yet.** The runner does not load or validate against the schema;
+  there is no ajv, no new runner dependency, no build step. The contract is still brittle, so we
+  hold off on a hard boundary guard (Section 5).
+- **`protocol.ts` stays hand-written for now.** We do not generate TS types from the schema yet
+  (that only pays off once the runner consumes the schema). The Python goldens remain the guard.
+
+Fern can reach this interface later through the existing **Pydantic → OpenAPI → Fern → SDKs**
+pipeline once the contract stabilizes (Section 4.1); for now the interface lives in the SDK only.
+
+## 5. Validation — deferred (no runtime guard yet)
+
+Author's direction (PR review): **do not validate for the moment.** The contract is still
+brittle, so this project does **not** add a runtime boundary guard on either side yet.
+
+- **No runner ingress validation.** `server.ts` / `cli.ts` keep parsing the `/run` body exactly
+  as today (empty body → defaults, unknown fields ignored). No ajv, no new runner dependency, no
+  schema loaded on the Node side. A present-but-malformed body is still tolerated for now.
+- **No runtime Python validation either.** `request_to_wire` / `result_from_wire` are not gated
+  on the schema at runtime.
+
+What the schema *is* used for in this phase is **Python-side tests only**: the exported schema
+validates the existing goldens (an example-must-validate check) and can validate `request_to_wire`
+output in a unit test, so the schema is proven faithful without changing any production code path.
+That is the full extent of validation for now.
+
+When the contract stabilizes, a real boundary guard (runner ingress validation + a symmetric
+Python result check) is a natural follow-up — see Section 8 / Open questions. Until then it is
+explicitly out of scope.
+
+## 6. The `/run` split decision
+
+The user agrees `/run` does too much. `/run` today conflates: (a) a one-shot turn, (b) a
+streaming turn (same route, switched by `Accept`), and (c) there is no separate way to ask "what
+can this runner do" except the unconsumed `/health`. Evaluated splits:
+
+### Keep as one endpoint: single-turn vs streaming
+
+**Do NOT split** one-shot and streaming into two endpoints. They share the identical
+`AgentRunRequest` and return the identical `AgentRunResult` (the streaming terminal `result`
+record is the same object with `events` emptied). The only difference is the `Accept` header
+selecting the framing. The `runner-interface` RFC §6 calls this the "symmetry guarantee", and
+both Python transports already parse both with the same `result_from_wire`. Splitting would
+duplicate the request schema and the dispatch for no contract benefit. Content negotiation
+(`Accept: application/x-ndjson`) is the right axis and is already in place. **Verdict: keep.**
+
+### Split out: a capability / contract probe
+
+**DO formalize the probe — the author endorsed this in review ("that's a good idea with
+capabilities").** `/health` already returns `{status, runner, protocol, engines, harnesses}` but
+nothing consumes it, and `HarnessCapabilities` (per-harness, 11 flags) is only discoverable by
+doing a full run. Recommendation:
+
+- Keep `GET /health` as the cheap liveness + identity + **contract version** probe (it already
+  carries `protocol`). This is what the A1 version check consumes (Section 7).
+- Add `GET /capabilities` (or `GET /capabilities?harness=pi_core`) that returns the static **base**
+  `HarnessCapabilities` per harness **without running a turn**. Today capabilities are probed
+  per-run and returned in the result; a static probe lets the service/playground render UI and
+  pre-validate a request (e.g. reject `images` for a harness that lacks `fileAttachments`) before
+  spending a run. The probe must state base-vs-effective explicitly: some flags are
+  mode-dependent (`streamingDeltas` is derived at run time in `engines/sandbox_agent.ts`), so the
+  static probe returns **base** capabilities and the run result stays authoritative for
+  mode-dependent flags. This is additive, not a split of `/run`'s job.
+
+**Verdict: keep `/run` unified for the turn; promote a `/capabilities` probe and actually consume
+`/health`.** This removes work from the run path (capability discovery) without fragmenting the
+turn contract.
+
+### Considered and rejected
+
+- A separate `/cancel` endpoint: rejected. Cancellation is correctly modeled as transport
+  teardown (close the NDJSON connection / kill the subprocess), already wired for
+  `runSandboxAgent` over HTTP. A `/cancel` would need session affinity the cold runtime does not
+  have. The A10 change adds a *cancelled outcome* (Section 7), not a cancel endpoint.
+- A separate tool-callback or MCP endpoint on the runner: out of scope and unchanged — those are
+  the runner *calling out* (`/tools/call`) and the gateway/MCP surfaces, which this work does not
+  touch.
+
+## 7. Folding in the sibling projects (A1, A3, A10)
+
+This project assumes and coordinates with three parallel efforts. The schema is where they meet.
+
+### A3 — backend removal + harness rename (already landed in the working tree)
+
+A3 removed the legacy in-process backend and the `backend` field, and renamed harness values
+`pi -> pi_core` and `agenta -> pi_agenta`. This is **no longer "assumed end state"** — it is
+already in the working tree (`version.ts` now declares `HARNESSES = ["pi_core","claude",
+"pi_agenta"]`; the pi golden is renamed `run_request.pi_core.json`; `engines/pi.ts` is deleted).
+So the schema simply describes that current shape:
+
+- No `backend` field.
+- `harness` is `pi_core` | `pi_agenta` | `claude`.
+
+Because this is a **pre-production POC, we do NOT version the pi/agenta rename.** There is no v1→v2
+cut for it, no downcaster, no `PROTOCOL_VERSION` bump tied to the rename — the wire just changes.
+The wire models are authored against today's renamed shape from the start.
+
+### A1 — versioning (coordinate: a simple string version, the LLM-as-judge style)
+
+A1 is the sibling project [`../contract-versioning/`](../contract-versioning/) (it owns the
+versioning strategy). Per the author's review, A1 is being simplified to **a plain string version
+plus an if/else branch — the same pattern the codebase already uses elsewhere** (the
+`x-ag-messages-version: "v1"` header and `VERCEL_MESSAGE_PROTOCOL_VERSION` string; the LLM-as-judge
+string-version + if/else dispatch). **No `{major, minor}` struct, no `contractVersion` field name,
+no upcaster/downcaster machinery.** This project defers to whatever simple string convention A1
+lands on and reuses it verbatim (do NOT invent a new scheme).
+
+It is still true that the runner advertises `protocol: 1` on `/health` (`version.ts`) but the
+Python client (`ts_runner.py`) never reads it. If A1 wants the version carried on the payload, it
+rides as the same simple string A1 chooses, stamped by the producer and branched on with a plain
+if/else on the consumer. Skew handling and any negotiation are A1's call; this project only agrees
+to carry the field A1 specifies in the wire models. Given the POC framing, even this is optional
+for now.
+
+### A10 — error model cleanup (in scope here)
+
+Replace `AgentRunResult.error?: string` with a structured error and add a distinct cancelled
+outcome:
+
+```jsonc
+// AgentRunResult, error branch
+{
+  "ok": false,
+  "error": {
+    "code": "model_error",            // taxonomy, see below
+    "message": "model exploded",      // human-readable, what today's string held
+    "retryable": false                // does a naive retry have a chance?
+  }
+}
+```
+
+- **Error taxonomy (`code`)**, a closed-ish enum the runner sets and the service can branch on:
+  `unsupported_harness`, `auth_error`, `quota_exceeded`, `rate_limited`, `configuration_error`,
+  `permission_denied`, `model_error`, `tool_error`, `mcp_error`, `sandbox_error`, `timeout`,
+  `cancelled`, `internal`. The `auth_error` / `quota_exceeded` / `rate_limited` codes are not
+  speculative: the runner already pattern-classifies these from provider error text in
+  `services/agent/src/engines/sandbox_agent/errors.ts` — the schema just gives that classification
+  a stable wire code. Keep the enum forward-compatible (an unknown code -> treat as `internal`),
+  mirroring the event "drop unknown" tolerance. (No `invalid_request` /
+  `unsupported_contract_version` codes for now — we are not validating requests or enforcing a
+  version at the boundary in this phase.)
+- **`retryable`** lets the caller distinguish a transient `timeout` / `rate_limited` / `mcp_error`
+  from a permanent `unsupported_harness` / `auth_error` / `configuration_error`.
+- **Distinct cancelled outcome — but only where it is actually deliverable.** A user/client abort
+  is **not** a failure. The subtlety (a real review catch): a *client disconnect* mid-stream
+  cannot reliably receive a terminal record, because the disconnect is exactly what tears the
+  transport down — `server.ts` aborts the run *on* response `close`, and the Python streaming
+  transports treat a stream with no terminal `result` as an error (`ts_runner.py`). So:
+  - **Cooperative cancellation while the transport is still open** (e.g. an in-band stop signal,
+    or a future `/cancel`-style affordance): emit the terminal `{ ok:false, error:{code:
+    "cancelled"} }` record — the §8b "exactly one terminal result" invariant holds and the result
+    stays authoritative. Set `retryable:false` (or omit it) — a cancel is intentional, not a
+    transient fault.
+  - **Transport teardown (the disconnect case we have today)**: the terminal record cannot be
+    delivered; the Python side must map "generator cancelled / connection closed by us" to a
+    distinct **`CancelledError`-style outcome**, NOT the generic "stream ended without a terminal
+    result" `RuntimeError`. This is a Python-side parsing/exception change, not a wire record.
+  - Optionally also emit a `done` event with `stopReason:"cancelled"` for streams (useful as a
+    live signal), but the terminal result remains authoritative when the connection is alive.
+- **Migration:** `result_from_wire` must accept **both** the old free-string `error` and the new
+  structured object (parse a string into `{code:"internal", message:str, retryable:false}`). This
+  read-compat is cheap and avoids a hard flag-day, but because this is a POC we do **not** treat
+  the new error shape as a versioned cut — the wire just changes to the structured form.
+
+This is a wire-shape change (the new structured error), made directly. No version bump is tied to
+it (POC).
+
+## 8. Incremental, test-at-each-step plan (POC-framed)
+
+No big-bang, but no versioning machinery either — this is a pre-production POC, so the wire just
+changes when it needs to. Each step is a small change plus the test that proves it. The
+heaviest items (runner-side validation, generating `protocol.ts`, version negotiation) are
+**deferred** until the contract stabilizes; they are listed at the end as follow-ups, not steps.
+
+The sequence respects the shared-surface rule (`agent-coordination.md`): any change to
+`protocol.ts` / `wire.py` / golden / the two contract tests is coordinated, single-PR, both
+sides + golden together.
+
+1. **Add the dedicated Pydantic wire models in the SDK (no wire change).**
+   Add `WireRunRequest` / `WireRunResult` (and the discriminated `WireAgentEvent`) wire models in
+   the SDK, with camelCase aliases, reproducing exactly what `request_to_wire` / `result_from_wire`
+   emit/parse today (against the *current* renamed shape — `pi_core` / `pi_agenta`, no `backend`).
+   *Test:* a unit test asserts `WireRunRequest(...).model_dump(by_alias=True, exclude-none-ish)
+   == request_to_wire(...)` for the pi_core, claude, and pi_agenta payloads (round-trip parity with
+   the goldens). Green before anything else.
+
+2. **Export the JSON interface into the SDK + a freshness test.**
+   Export `model_json_schema()` for the wire models and ship it in the SDK alongside the existing
+   `CATALOG_TYPES` JSON interfaces (Section 4.1). Commit the artifact.
+   *Test:* a test regenerates the schema in-memory and asserts it equals the committed export
+   (drift -> fail), the same discipline the goldens already use.
+
+3. **Assert the existing goldens validate against the exported schema (Python side, tests only).**
+   *Test:* load each golden, validate against the exported schema (`jsonschema`); all must pass.
+   This proves the schema faithfully describes today's wire. **No production code path changes, and
+   nothing on the runner side** — validation here is a test, not a runtime guard (Section 5).
+
+4. **Make the wire models the single producer.**
+   Reimplement `request_to_wire` / `result_from_wire` in terms of the wire models, keeping the
+   omit-when-empty serializer behavior. The goldens stay byte-identical (this is a refactor, the
+   models already match the wire from step 1).
+   *Test:* the existing golden wire-contract test stays green unchanged; add a parity test that the
+   reimplemented serializers equal the old output.
+
+5. **Replace the duplicated key lists with a schema-derived guard (Python side).**
+   Swap the hand-kept Python `KNOWN_REQUEST_KEYS` for a set derived from the exported schema's
+   `properties`, so the Python guard cannot silently fall behind. The TS `KNOWN_REQUEST_KEYS` guard
+   in `wire-contract.test.ts` stays hand-written for now (we are not generating `protocol.ts` or
+   touching the runner this phase).
+   *Test:* `set(schema.properties) == set(python KNOWN_REQUEST_KEYS)`.
+
+6. **Structured error model + cancelled outcome (A10).**
+   Result `error` becomes `{code, message, retryable}`; `result_from_wire` also reads the old free
+   string for read-compat (string -> `{code:"internal", message:str}`). Cancellation: cooperative
+   cancel emits the terminal `{ok:false, error:{code:"cancelled"}}`; transport-teardown cancel maps
+   to a distinct Python `CancelledError` (per §7 A10). This is a direct wire change — **no version
+   bump** (POC).
+   *Test:* `test_wire_contract.py` parses an old-string-error golden and a new-structured golden; a
+   transport test asserts a disconnect yields the Python `CancelledError`. New goldens:
+   `run_result.cancelled.json`, `run_result.error_structured.json`.
+
+7. **Promote the capability probe: `GET /capabilities` (additive, the author endorsed it).**
+   Add the static per-harness `HarnessCapabilities` route to the runner. It returns **base**
+   capabilities (what the harness supports at all); mode-dependent flags (`streamingDeltas`, derived
+   at run time in `engines/sandbox_agent.ts`) stay authoritative only in a run result. The service
+   can pre-render UI / pre-check a request against the base set.
+   *Test:* `server.test.ts` asserts `GET /capabilities` returns the base capability map per harness
+   without running a turn.
+
+### Deferred follow-ups (only once the contract stabilizes)
+
+These are explicitly **not** in this phase, per the author's review:
+
+- **Runner-side request validation.** Loading the schema in `server.ts` / `cli.ts` and rejecting a
+  malformed `/run` (with ajv or similar). The contract is too brittle to gate on yet.
+- **Generating `protocol.ts` from the schema.** Pays off only once the runner consumes the schema;
+  until then `protocol.ts` stays hand-written and the Python goldens are the guard.
+- **A version field + negotiation.** Owned by A1; if/when it lands it is a simple string version +
+  if/else (Section 7 A1), not a `{major, minor}` or upcaster/downcaster scheme.
+- **Fern generating the interface across languages.** Reachable later via Pydantic → OpenAPI → Fern
+  once the contract is stable enough to publish into the clients (Section 4.1).
+
+After this phase: one Pydantic wire-model source -> the JSON interface shipped in the SDK ->
+structured errors + a correctly-modeled cancelled outcome -> a real capability probe. No runner
+validation, no version machinery, no generated TS types — those are deferred until the contract
+settles.
+
+## 9. Risks and mitigations
+
+- **Drift between `protocol.ts` types and the schema.** While `protocol.ts` stays hand-written and
+  the runner does not consume the schema, this drift is tolerated as a POC trade-off. The Python
+  goldens + the schema-derived Python key guard (step 5) catch Python-side drift; aligning the TS
+  types is a manual discipline for now. Generating `protocol.ts` is the deferred fix.
+- **The committed schema export going stale.** Mitigated by step 2's freshness test (regenerate ==
+  committed), the same discipline the goldens already use.
+- **Sequencing against A1.** A1 owns the version convention (a simple string + if/else); this
+  project only carries whatever field A1 specifies. The error model (step 6) and capability probe
+  (step 7) do not depend on A1.
+- **No boundary guard means typos still pass silently.** Accepted for now — the contract is too
+  brittle to gate on. The runner keeps today's behavior. Revisit with the deferred runner-side
+  validation once the contract stabilizes.
+
+## 10. Open questions for review
+
+1. **Wire models placement.** A new `agents/wire_models.py` next to `dtos.py` (proposed) vs a
+   dedicated contract package. The exported JSON interface ships in the SDK alongside
+   `CATALOG_TYPES`.
+2. **Where exactly the exported interface is surfaced in the SDK.** As an entry in (or sibling of)
+   `CATALOG_TYPES` in `sdks/python/agenta/sdk/utils/types.py`, vs a standalone export. Either keeps
+   it SDK-resident; the `CATALOG_TYPES` path also makes it `/inspect`-discoverable.
+3. **Cancelled modeling.** Cooperative cancel -> terminal `error.code:"cancelled"`; transport
+   teardown -> distinct Python `CancelledError` (proposed). Optionally also a `done`
+   `stopReason:"cancelled"`. `retryable` for cancel: `false`/omit.
+4. **Capability probe shape + base-vs-effective.** Return all harnesses (proposed) vs `?harness=`;
+   and the probe returns **base** capabilities (proposed), with mode-dependent flags
+   (`streamingDeltas`) authoritative only in a run result.
+5. **The deferred follow-ups (Section 8).** Confirm runner-side validation, generated `protocol.ts`,
+   the version field, and Fern publication are all out of scope for this POC phase.
+
+## 11. Review
+
+This plan was reviewed by Codex (gpt-5.5, xhigh, read-only) on 2026-06-24, then revised on
+2026-06-24 per the author's PR review on #4830. The author's direction simplified it toward the POC
+reality:
+
+- **No back-compat burden** — this is still an internal POC, so any wire shape may change freely
+  (the "must preserve the model/connection split" framing was dropped).
+- **Pydantic as the source for now**, with the immediate goal that the exported JSON interface
+  lives **in the SDK** (the `CATALOG_TYPES` path), plus a Fern investigation (Section 4.1): Fern
+  here is driven by Pydantic → OpenAPI → Fern → SDKs, so it can see this interface later via the
+  OpenAPI surface once the contract stabilizes — no hard blocker, only a timing call.
+- **No sidecar/runner validation yet** (no ajv, no new runner dependency) — the contract is still
+  brittle (Section 5); `protocol.ts` stays hand-written for now.
+- **No versioning machinery** — the pi/agenta rename (already landed) is not versioned, and any
+  version field defers to A1's simple string + if/else convention.
+- **Keep `/capabilities`** — the author endorsed the probe.
+
+Codex's earlier structural catches that survive the simplification: source from dedicated **wire**
+models (not the snake_case semantic DTOs); cancellation via a terminal record only works for
+**cooperative** cancel (a disconnect maps to a Python `CancelledError`); the error taxonomy is
+grounded in what `engines/sandbox_agent/errors.ts` already classifies; capabilities are
+base-vs-effective. The corrections that were about versioning/validation (two-breaking-changes-one-
+cut, the step-5 error-shape ordering, the both-transport version probe) are **moot** now that
+versioning and runner validation are deferred.
diff --git a/docs/design/agent-workflows/projects/wire-contract-schema/status.md b/docs/design/agent-workflows/projects/wire-contract-schema/status.md
new file mode 100644
index 0000000000..12420c086a
--- /dev/null
+++ b/docs/design/agent-workflows/projects/wire-contract-schema/status.md
@@ -0,0 +1,142 @@
+# Status: wire-contract-schema
+
+| | |
+| --- | --- |
+| **Phase** | **Implemented** (2026-06-24). Pydantic wire models are the schema source of truth, exported into the SDK via `CATALOG_TYPES`; the `/inspect` canonical response + typed outputs landed. No runner/validation work (deferred). |
+| **Owner** | wire-contract-schema (A2 in the A1/A2/A3/A10 cohort) |
+| **Lane** | `feat/agent-wire-contract-schema-plan` (PR #4830), re-stacked on `feat/agent-contract-versioning-docs` (#4829). One PR = plan doc + impl. |
+| **Created** | 2026-06-24 |
+| **Revised** | 2026-06-24 (author PR review) |
+| **Implemented** | 2026-06-24 |
+
+## What shipped (the implementation)
+
+The plan's source-of-truth slice plus the folded `/inspect` follow-ups (architecture-followups
+issue 1 + typed outputs). Resolved every open question with the least-code option:
+
+- **Wire models as the single schema source of truth** —
+  `sdks/python/agenta/sdk/agents/wire_models.py`: dedicated camelCase Pydantic models
+  (`WireRunRequest`, `WireRunResult`, sub-objects, and an OPEN `WireAgentEvent` whose `type` is
+  optional so a typeless event is tolerated, mirroring the parser's drop behavior). NOT the
+  snake_case semantic DTOs. `run_contract_schemas()` exports their dereferenced, camelCase JSON
+  Schema.
+- **The JSON interface ships in the SDK** via `CATALOG_TYPES` (`run_request` / `run_result`), the
+  same path `agent_config` takes — so it is `/inspect`-discoverable through
+  `GET /workflows/catalog/types/{type}`. No new endpoint.
+- **Tests, no runtime validation** (`test_wire_models.py`): the committed catalog matches a fresh
+  export (freshness guard), all four goldens validate against the exported schema and parse into
+  the models, `request_to_wire` output validates, and the schema's property set equals
+  `KNOWN_REQUEST_KEYS` (the schema-derived key guard). Nothing gates a live `/run`.
+- **`wire.py` stays the dict producer** — least-code: the omit-when-empty behavior lives there and
+  is pinned by the goldens (a thing `model_json_schema()` cannot express). The models are the
+  *schema* authority and a docstring in `wire.py` points to them. No serializer rewrite.
+- **Issue 1 — canonical `/inspect` response**: `WorkflowInspectResponse` in
+  `sdks/python/agenta/sdk/models/workflows.py`; `handle_inspect_success` normalizes the
+  internally-built `WorkflowInvokeRequest` into it (`_to_inspect_response`), lifting the resolved
+  `WorkflowRevisionData` to a flat top-level `revision`, so schemas live at
+  `response.revision.schemas` (was the latent-broken `data.revision.data.schemas` nesting). The
+  three `/inspect` routes' `response_model` is now `WorkflowInspectResponse`. FE: the
+  `InspectWorkflowResponse` type and the `store.ts` read now resolve against the real body
+  (`revision.schemas`); the deprecated `interface?.schemas` fallback is kept on the type as a
+  migration bridge (two sibling readers still use it).
+- **Issue 4 — typed `/inspect` outputs**: `services/oss/src/agent/schemas.py` `AGENT_OUTPUTS_SCHEMA`
+  is keyed per output surface (`invoke` -> `message`, `messages` -> `messages`). Reuses existing
+  catalog markers, so the catalog-refs guard is unchanged. POC: no flat back-compat output field.
+
+### Deferred (noted in the PR body; NOT built)
+
+- The `/run` `version` field + dispatch (A1 already deferred it).
+- Runner-side request validation (no ajv, no runner dependency).
+- The `GET /capabilities` probe.
+- Generating `protocol.ts` from the schema; the structured-error / cancelled outcome; Fern
+  publication across languages.
+- `services/agent/CLAUDE.md`'s mirroring rule should mention the Pydantic wire models are now the
+  schema source — left for the runner owner (`services/agent/*` is their surface, not touched here).
+
+## What exists
+
+- `README.md` — the plan, revised to the author's POC framing: current-state assessment, the three
+  source-of-truth options with the Option B recommendation (Pydantic-as-source **for now**, JSON
+  interface **in the SDK**, a Fern investigation in §4.1), the `/run` split decision (keep unified,
+  promote `/capabilities`), the A10 structured-error + cancelled change, A1 coordination on a
+  **simple string version**, a 7-step POC-framed plan with the heavy items deferred, and a Review
+  section (§11) recording both the Codex pass and the author's revision.
+
+## Author PR review (2026-06-24) — what changed
+
+Four inline comments on #4830, all addressed:
+
+1. **No back-compat burden** (README ~§3.1). Dropped all "the schema must preserve the
+   model/connection split / omit-when-empty bytes" framing. This is an internal POC; any wire shape
+   may change freely. Shape notes are now described as "current serializer behavior, not a
+   constraint." (README §Status, §1, §2, §3.1, §11.)
+2. **Pydantic-as-source now + interface in the SDK + Fern** (README ~§4 recommendation). Revised the
+   recommendation: Pydantic is the source for now; the immediate goal is that the exported JSON
+   Schema interface lives **in the SDK** (the `CATALOG_TYPES` path); added §4.1 investigating Fern.
+   Explicitly **dropped using the schema in the sidecar/runner** for now (contract still brittle).
+3. **No runner ingress validation** (README ~§5). Rewrote §5 as "validation — deferred": no ajv, no
+   runner dependency, no `server.ts`/`cli.ts` request validation. The schema is used in Python tests
+   only (goldens-must-validate). A boundary guard is a deferred follow-up.
+4. **Keep `/capabilities`** (README ~§6). The probe stays; the author endorsed it. Noted his
+   endorsement inline.
+
+## Fern findings (the §4.1 investigation)
+
+- Fern in this repo generates the multi-language API **clients** (Python + TS) under `clients/` and
+  `web/packages/agenta-api-client/`. The pipeline (`clients/scripts/generate.sh`) is
+  **Pydantic → `/api/openapi.json` → Fern (`fernapi/fern-python-sdk`, `fernapi/fern-typescript-sdk`)
+  → SDKs**. There is no checked-in `.fern/` IDL; Fern's only input is the generated OpenAPI doc.
+- The SDK **already** exposes Pydantic-derived JSON interfaces: `CATALOG_TYPES` in
+  `sdks/python/agenta/sdk/utils/types.py` (~line 1265) is a dict of `model_json_schema()` outputs,
+  surfaced via `/inspect` `x-ag-type-ref` markers (`services/oss/src/agent/schemas.py`). The wire
+  contract should ship the same way.
+- **Can Fern see this interface? Yes — but only via OpenAPI, with a caveat.** `/run` is the
+  service↔runner spine, not a public FastAPI endpoint, so it is not in `openapi.json` today and Fern
+  cannot see it as-is. Making Fern see it = reference the wire Pydantic models from a FastAPI surface
+  (FastAPI then emits them into `components/schemas`, the same path `AgentConfigSchema` takes). **No
+  hard blocker** — the only reason not to now is that the contract is brittle and publishing a moving
+  target into every generated client is premature. So: SDK-resident now, Fern later.
+
+## Decisions made in the (revised) plan
+
+1. **Schema source = dedicated Pydantic *wire* models (Option B), NOT the semantic DTOs**, authored
+   against the **already-landed** renamed shape (`pi_core` / `pi_agenta`, no `backend`). Export
+   `model_json_schema()` and ship it in the SDK alongside `CATALOG_TYPES`.
+2. **`protocol.ts` stays hand-written for now.** No generated TS types this phase (only pays off once
+   the runner consumes the schema). Python goldens are the guard.
+3. **`/run` stays unified for the turn.** Promote a `GET /capabilities` probe (static **base**
+   per-harness capabilities). Rejected: a `/cancel` endpoint.
+4. **Error model `{ code, message, retryable }`** with a grounded taxonomy and a cancelled outcome
+   (terminal record for cooperative cancel; Python `CancelledError` for transport-teardown cancel).
+   Made as a **direct wire change, no version bump** (POC).
+5. **No versioning machinery.** The pi/agenta rename is not versioned. Any version field defers to
+   A1's **simple string version + if/else** (the `x-ag-messages-version: "v1"` / LLM-as-judge
+   pattern) — no `{major, minor}`, no `contractVersion` name, no upcaster/downcaster.
+6. **No runner/sidecar validation yet** (deferred follow-up).
+
+## Deferred (Section 8 follow-ups, out of scope for this POC phase)
+
+- Runner-side request validation (ajv / boundary guard).
+- Generating `protocol.ts` from the schema.
+- A version field + negotiation (A1-owned, simple string).
+- Fern generating the interface across languages (via Pydantic → OpenAPI once stable).
+
+## Coordination
+
+- **A1 (`contract-versioning`)** — sibling at `../contract-versioning/`, being simplified by another
+  agent to a plain string version + if/else per the author. This project reuses whatever string
+  convention A1 lands on; does NOT invent its own. (Did not touch A1's README — another agent owns it.)
+- **A3 (backend removal + harness rename)** — **already landed in the working tree** (`version.ts`
+  has `pi_core`/`pi_agenta`, golden renamed `run_request.pi_core.json`, `engines/pi.ts` deleted). The
+  wire models describe that shape from the start; the rename is not versioned (POC).
+- **A10 (error model)** — folded into the plan (step 6) as a direct wire change.
+- **`sidecar-trust-and-sandbox-enforcement`** flagged a stale `protocol.ts:149-150` comment; noted.
+- **DOCS-ONLY.** No edit to `protocol.ts` / `wire.py` / golden / contract tests / `interfaces/*`.
+  Composio, the tool gateway, connections, and MCP are described as existing and unchanged.
+
+## Next actions (after review)
+
+- Get sign-off on README §10 open questions (wire-model placement, where the SDK surfaces the export,
+  cancelled modeling, capability probe shape, and the deferred follow-up list).
+- Confirm with A1 the exact simple string version convention to carry (if any) on the payload.
+- Then implement step 1 (dedicated wire models with round-trip parity tests against the goldens).
diff --git a/sdks/python/agenta/sdk/agents/utils/wire.py b/sdks/python/agenta/sdk/agents/utils/wire.py
index ae0e369c70..a4fd01f6a5 100644
--- a/sdks/python/agenta/sdk/agents/utils/wire.py
+++ b/sdks/python/agenta/sdk/agents/utils/wire.py
@@ -5,6 +5,14 @@
 under ``sdks/python/oss/tests/pytest/unit/agents/golden/`` (see ``test_wire_contract.py``).
 The runner drives one engine (the sandbox-agent ACP path); the ``harness`` field selects the
 agent, so there is no engine selector on the wire.
+
+The SCHEMA source of truth for this contract is the dedicated Pydantic wire models in
+``agenta.sdk.agents.wire_models`` (``WireRunRequest`` / ``WireRunResult``). Their exported JSON
+Schema ships in the SDK through ``CATALOG_TYPES`` and is asserted to describe exactly what the
+functions below emit/parse (``test_wire_models.py``). The serializer here stays a hand-built
+dict on purpose: the omit-when-empty behavior lives in this file (and is pinned by the goldens),
+which ``model_json_schema()`` cannot express. Add or rename a wire field in BOTH places (here and
+the wire models) plus ``protocol.ts`` and the goldens — the tests catch a one-sided change.
 """
 
 from __future__ import annotations
diff --git a/sdks/python/agenta/sdk/agents/wire_models.py b/sdks/python/agenta/sdk/agents/wire_models.py
new file mode 100644
index 0000000000..cce8381926
--- /dev/null
+++ b/sdks/python/agenta/sdk/agents/wire_models.py
@@ -0,0 +1,374 @@
+"""The ``/run`` wire contract as Pydantic models — the single schema source of truth.
+
+These models describe the EXACT camelCase JSON the Python producer emits and parses in
+``utils/wire.py`` (``request_to_wire`` / ``result_from_wire``) and the TS runner mirrors in
+``services/agent/src/protocol.ts``. They are deliberately a SEPARATE set from the semantic
+DTOs in ``dtos.py``: the DTOs are snake_case and intentionally loose (``AgentEvent`` is a free
+``type: str`` + ``data`` bag), while the real wire is camelCase with a discriminated event
+union. Exporting ``model_json_schema()`` off the DTOs would produce the wrong schema, so the
+contract lives here.
+
+What these models are for in this phase (a pre-production POC):
+
+- They are the schema authority: ``run_contract_schemas()`` exports their JSON Schema, which
+  ships in the SDK through ``CATALOG_TYPES`` (the same mechanism ``AgentConfigSchema`` uses to
+  reach the SDK / clients / ``/inspect``). A test asserts the committed catalog entry matches a
+  fresh export, so the schema cannot drift from these models.
+- They validate the golden fixtures and ``request_to_wire`` output in tests, proving the schema
+  faithfully describes today's wire.
+
+What they are NOT (deferred, per the project plan):
+
+- They are NOT a runtime guard. ``request_to_wire`` still builds a plain dict and the runner
+  still parses the body as-is; nothing validates against these models on a live ``/run``.
+- They do NOT carry a contract ``version`` field, structured errors, or a ``cancelled`` outcome
+  yet — those are deferred follow-ups. The result error stays the current free string.
+
+Conventions: every field is camelCase via an alias, with ``populate_by_name=True`` so the
+models also accept the Python field name. Optional fields default to ``None`` / empty, matching
+the implicitly-all-optional wire. ``extra="allow"`` keeps the models forward-compatible (an
+unknown field is not the schema's job to reject in this POC phase).
+"""
+
+from __future__ import annotations
+
+from typing import Any, ClassVar, Dict, List, Literal, Optional, Union
+
+from pydantic import BaseModel, ConfigDict, Field
+
+
+class _WireModel(BaseModel):
+    """Base for every wire model: camelCase aliases, accept-by-name, allow extra.
+
+    ``populate_by_name=True`` lets a producer construct with the Python field names while the
+    schema and ``model_dump(by_alias=True)`` speak camelCase. ``extra="allow"`` keeps the
+    contract open/forward-compatible (matching the runner's tolerant parsing); this POC does not
+    reject unknown fields.
+
+    ``__ag_type__`` is the catalog key a top-level model carries into ``CATALOG_TYPES`` (the
+    same role :class:`~agenta.sdk.utils.types.AgSchemaMixin` plays for the other catalog types).
+    It is NOT mixed in from ``utils/types`` on purpose: ``utils/types`` imports the agents
+    package, so importing it here would create a load cycle. ``ag_type()`` reads the marker.
+    """
+
+    model_config = ConfigDict(populate_by_name=True, extra="allow")
+
+    __ag_type__: ClassVar[Optional[str]] = None
+
+    @classmethod
+    def ag_type(cls) -> str:
+        if cls.__ag_type__ is None:
+            raise ValueError(f"{cls.__name__} does not define __ag_type__")
+        return cls.__ag_type__
+
+
+# ---------------------------------------------------------------------------
+# Shared sub-objects
+# ---------------------------------------------------------------------------
+
+
+class WireEndpoint(_WireModel):
+    """Non-secret connection config (mirrors ``Endpoint.to_wire``)."""
+
+    base_url: Optional[str] = Field(default=None, alias="baseUrl")
+    api_version: Optional[str] = Field(default=None, alias="apiVersion")
+    region: Optional[str] = None
+    headers: Optional[Dict[str, str]] = None
+
+
+class WireConnection(_WireModel):
+    """The author's credential-connection intent (``{mode, slug?}``)."""
+
+    mode: Literal["agenta", "self_managed"] = "agenta"
+    slug: Optional[str] = None
+
+
+class WireContentBlock(_WireModel):
+    """One content block of a message (mirrors ``ContentBlock.to_wire``)."""
+
+    type: str
+    text: Optional[str] = None
+    data: Optional[str] = None
+    mime_type: Optional[str] = Field(default=None, alias="mimeType")
+    uri: Optional[str] = None
+    tool_call_id: Optional[str] = Field(default=None, alias="toolCallId")
+    tool_name: Optional[str] = Field(default=None, alias="toolName")
+    input: Optional[Any] = None
+    output: Optional[Any] = None
+    is_error: Optional[bool] = Field(default=None, alias="isError")
+
+
+class WireChatMessage(_WireModel):
+    """A chat message on the wire: ``{role, content}`` (string or content blocks)."""
+
+    role: str
+    content: Union[str, List[WireContentBlock]] = ""
+
+
+class WireTraceContext(_WireModel):
+    """Agenta trace context threaded into a run (mirrors ``TraceContext.to_wire``)."""
+
+    traceparent: Optional[str] = None
+    baggage: Optional[str] = None
+    endpoint: Optional[str] = None
+    authorization: Optional[str] = None
+    capture_content: bool = Field(default=True, alias="captureContent")
+
+
+class WireToolCallback(_WireModel):
+    """Where callback (gateway) tools route their calls back to."""
+
+    endpoint: Optional[str] = None
+    authorization: Optional[str] = None
+
+
+class WireRenderHint(_WireModel):
+    """How a tool's result should be rendered by a client."""
+
+    kind: Optional[str] = None
+    component: Optional[str] = None
+
+
+class WireResolvedToolSpec(_WireModel):
+    """A resolved tool the runner delivers to the harness (the three-axis tool surface).
+
+    ``kind`` is the executor axis (``callback`` / ``code`` / ``client`` / ``builtin``);
+    ``needsApproval`` / ``render`` are the orthogonal axes; ``callRef`` / ``runtime`` / ``code``
+    / ``env`` are executor-specific. Extra fields are allowed so an executor variant the schema
+    has not enumerated still validates.
+    """
+
+    name: str
+    description: Optional[str] = None
+    input_schema: Optional[Dict[str, Any]] = Field(default=None, alias="inputSchema")
+    kind: Optional[str] = None
+    call_ref: Optional[str] = Field(default=None, alias="callRef")
+    runtime: Optional[str] = None
+    code: Optional[str] = None
+    env: Optional[Dict[str, str]] = None
+    needs_approval: Optional[bool] = Field(default=None, alias="needsApproval")
+    render: Optional[WireRenderHint] = None
+    read_only: Optional[bool] = Field(default=None, alias="readOnly")
+    permission: Optional[str] = None
+
+
+class WireMcpServer(_WireModel):
+    """A user-declared MCP server (stdio or http), mirrors ``mcp_servers_to_wire``."""
+
+    name: str
+    transport: Optional[str] = None
+    command: Optional[str] = None
+    args: Optional[List[str]] = None
+    env: Optional[Dict[str, str]] = None
+    url: Optional[str] = None
+    headers: Optional[Dict[str, str]] = None
+    tools: Optional[List[str]] = None
+    permission: Optional[str] = None
+
+
+class WireSkillFile(_WireModel):
+    """One bundled file in a resolved inline skill package."""
+
+    path: str
+    content: str
+    executable: Optional[bool] = None
+
+
+class WireSkill(_WireModel):
+    """A resolved inline skill package (mirrors ``skill_to_wire``)."""
+
+    name: str
+    description: Optional[str] = None
+    body: Optional[str] = None
+    files: Optional[List[WireSkillFile]] = None
+    disable_model_invocation: Optional[bool] = Field(
+        default=None, alias="disableModelInvocation"
+    )
+    allow_executable_files: Optional[bool] = Field(
+        default=None, alias="allowExecutableFiles"
+    )
+
+
+class WireNetworkEgress(_WireModel):
+    """The sandbox outbound-network policy (mirrors ``NetworkEgress``)."""
+
+    mode: Literal["on", "off", "allowlist"] = "on"
+    allowlist: List[str] = Field(default_factory=list)
+
+
+class WireSandboxPermission(_WireModel):
+    """The declared sandbox security boundary (mirrors ``SandboxPermission.to_wire``)."""
+
+    network: WireNetworkEgress = Field(default_factory=WireNetworkEgress)
+    filesystem: Optional[Literal["on", "readonly", "off"]] = None
+    enforcement: Literal["strict", "best_effort"] = "strict"
+
+
+class WireHarnessFile(_WireModel):
+    """One file the active harness's config renders into the session cwd before a run."""
+
+    path: str
+    content: str
+
+
+class WireHarnessCapabilities(_WireModel):
+    """What a harness can do, probed by the runner (the 11 boolean flags)."""
+
+    text_messages: bool = Field(default=True, alias="textMessages")
+    images: bool = False
+    file_attachments: bool = Field(default=False, alias="fileAttachments")
+    mcp_tools: bool = Field(default=False, alias="mcpTools")
+    tool_calls: bool = Field(default=False, alias="toolCalls")
+    reasoning: bool = False
+    plan_mode: bool = Field(default=False, alias="planMode")
+    permissions: bool = False
+    usage: bool = False
+    streaming_deltas: bool = Field(default=False, alias="streamingDeltas")
+    session_lifecycle: bool = Field(default=False, alias="sessionLifecycle")
+
+
+class WireAgentUsage(_WireModel):
+    """Token / cost usage rolled onto a workflow span."""
+
+    input: Optional[int] = None
+    output: Optional[int] = None
+    total: Optional[int] = None
+    cost: Optional[float] = None
+
+
+# ---------------------------------------------------------------------------
+# The event union (open / forward-compatible)
+# ---------------------------------------------------------------------------
+
+
+class WireAgentEvent(_WireModel):
+    """One structured event from a run, keyed by ``type``.
+
+    The Python parser (``AgentEvent.from_wire``) keeps the whole event verbatim and drops a
+    typeless event, so the wire event is intentionally OPEN: ``type`` is the discriminator and
+    ``extra="allow"`` carries the rest. ``type`` is OPTIONAL on the model on purpose — a
+    typeless event is dropped, not rejected (a golden pins exactly that), and the schema must
+    describe that tolerance rather than reject it. A closed discriminated union would also reject
+    the forward-compatible event types the runner may add, which contradicts the "drop unknown"
+    guarantee. The known ``type`` values are documented for readers, not enforced: ``message``,
+    ``thought``, the ``message_*`` / ``reasoning_*`` lifecycle trios, ``tool_call``,
+    ``tool_result``, ``interaction_request``, ``data``, ``file``, ``usage``, ``error``, ``done``.
+    """
+
+    type: Optional[str] = None
+
+
+# ---------------------------------------------------------------------------
+# The request
+# ---------------------------------------------------------------------------
+
+
+class WireRunRequest(_WireModel):
+    """The ``/run`` request payload — the exact field set ``request_to_wire`` may emit.
+
+    Every field is optional on the wire (the contract is implicitly all-optional), so the schema
+    expresses "optional" while the producer's omit-when-empty behavior stays in ``wire.py`` and
+    is pinned by the golden fixtures. The harness selects the agent (``pi_core`` / ``pi_agenta``
+    / ``claude``); there is no engine selector on the wire (A3 removed the legacy backend).
+    """
+
+    __ag_type__ = "run_request"
+
+    harness: Optional[str] = None
+    sandbox: Optional[str] = None
+    session_id: Optional[str] = Field(default=None, alias="sessionId")
+    agents_md: Optional[str] = Field(default=None, alias="agentsMd")
+    # Model + connection. ``model`` stays a plain string; the structured provider/connection
+    # fields ride alongside only when a resolved connection / model ref is present.
+    model: Optional[str] = None
+    provider: Optional[str] = None
+    connection: Optional[WireConnection] = None
+    deployment: Optional[str] = None
+    endpoint: Optional[WireEndpoint] = None
+    credential_mode: Optional[str] = Field(default=None, alias="credentialMode")
+    # Turn.
+    messages: Optional[List[WireChatMessage]] = None
+    # Secrets injected as harness env (provider keys); never written to the agent filesystem.
+    secrets: Optional[Dict[str, str]] = None
+    trace: Optional[WireTraceContext] = None
+    # Tools + skills.
+    tools: Optional[List[str]] = None
+    custom_tools: Optional[List[WireResolvedToolSpec]] = Field(
+        default=None, alias="customTools"
+    )
+    tool_callback: Optional[WireToolCallback] = Field(
+        default=None, alias="toolCallback"
+    )
+    mcp_servers: Optional[List[WireMcpServer]] = Field(default=None, alias="mcpServers")
+    skills: Optional[List[WireSkill]] = None
+    # Policy + prompt overrides + files.
+    permission_policy: Optional[str] = Field(default=None, alias="permissionPolicy")
+    system_prompt: Optional[str] = Field(default=None, alias="systemPrompt")
+    append_system_prompt: Optional[str] = Field(
+        default=None, alias="appendSystemPrompt"
+    )
+    sandbox_permission: Optional[WireSandboxPermission] = Field(
+        default=None, alias="sandboxPermission"
+    )
+    harness_files: Optional[List[WireHarnessFile]] = Field(
+        default=None, alias="harnessFiles"
+    )
+
+
+# ---------------------------------------------------------------------------
+# The result
+# ---------------------------------------------------------------------------
+
+
+class WireRunResult(_WireModel):
+    """The ``/run`` result payload — what ``result_from_wire`` parses.
+
+    ``ok`` is the outcome flag; on failure ``error`` is the current free string (a structured
+    error model is a deferred follow-up, not this phase). On success the run carries ``output``,
+    ``messages``, ``events``, ``usage``, ``stopReason``, ``capabilities``, plus the resolved
+    ``sessionId`` / ``model`` / ``traceId``.
+    """
+
+    __ag_type__ = "run_result"
+
+    ok: bool
+    output: Optional[str] = None
+    messages: Optional[List[WireChatMessage]] = None
+    events: Optional[List[WireAgentEvent]] = None
+    usage: Optional[WireAgentUsage] = None
+    stop_reason: Optional[str] = Field(default=None, alias="stopReason")
+    capabilities: Optional[WireHarnessCapabilities] = None
+    session_id: Optional[str] = Field(default=None, alias="sessionId")
+    model: Optional[str] = None
+    trace_id: Optional[str] = Field(default=None, alias="traceId")
+    error: Optional[str] = None
+
+
+# ---------------------------------------------------------------------------
+# The exported JSON interface
+# ---------------------------------------------------------------------------
+
+# The top-level wire models whose JSON Schema ships in the SDK. Each is keyed by its
+# ``x-ag-type`` so ``CATALOG_TYPES`` can carry it the same way it carries ``agent_config``.
+WIRE_CONTRACT_MODELS = (WireRunRequest, WireRunResult)
+
+
+def run_contract_schemas() -> Dict[str, Dict[str, Any]]:
+    """The exported JSON Schema of the ``/run`` wire models, keyed by ``x-ag-type``.
+
+    Uses ``model_json_schema(by_alias=True)`` so the emitted property names are the camelCase
+    wire keys, and dereferences ``$defs`` (the same treatment ``CATALOG_TYPES`` gives every other
+    entry, via ``_dereference_schema``) so the catalog entries are self-contained. This is the
+    single export point: ``CATALOG_TYPES`` adds these entries, and a freshness test asserts the
+    committed catalog matches a fresh call here so the schema cannot silently drift from the
+    models.
+    """
+    # Local import to avoid a module-load cycle: ``utils/types`` imports the agents package.
+    from ..utils.types import _dereference_schema
+
+    schemas: Dict[str, Dict[str, Any]] = {}
+    for model in WIRE_CONTRACT_MODELS:
+        schema = _dereference_schema(model.model_json_schema(by_alias=True))
+        schema["x-ag-type"] = model.ag_type()
+        schemas[model.ag_type()] = schema
+    return schemas
diff --git a/sdks/python/agenta/sdk/decorators/routing.py b/sdks/python/agenta/sdk/decorators/routing.py
index e4e8543f07..b312b8cb02 100644
--- a/sdks/python/agenta/sdk/decorators/routing.py
+++ b/sdks/python/agenta/sdk/decorators/routing.py
@@ -14,6 +14,7 @@
 from agenta.sdk.models.workflows import (
     WorkflowInvokeRequest,
     WorkflowInspectRequest,
+    WorkflowInspectResponse,
     WorkflowServiceStatus,
     WorkflowBatchResponse,
     WorkflowStreamingResponse,
@@ -350,11 +351,39 @@ async def handle_invoke_failure(exception: Exception) -> Response:
     return _make_json_response(error)
 
 
+def _to_inspect_response(
+    request: WorkflowInvokeRequest,
+) -> WorkflowInspectResponse:
+    """Normalize the internally-built ``WorkflowInvokeRequest`` into the canonical response.
+
+    ``workflow.inspect()`` builds its result as a ``WorkflowInvokeRequest`` (a REQUEST model), so
+    the resolved interface lands nested at ``data.revision.data``. The public ``/inspect``
+    contract is :class:`WorkflowInspectResponse` instead, which lifts that
+    :class:`WorkflowRevisionData` up to a flat top-level ``revision`` — so a client reads schemas
+    at ``response.revision.schemas`` rather than guessing the request envelope.
+    """
+    nested = (request.data.revision or {}) if request.data else {}
+    revision_data = nested.get("data") if isinstance(nested, dict) else None
+    # Carry the resolved config so the public boundary doesn't drop it: the FE reads
+    # ``configuration.parameters`` as a fallback when ``revision.parameters`` is absent.
+    parameters = (
+        revision_data.get("parameters") if isinstance(revision_data, dict) else None
+    )
+    configuration = {"parameters": parameters} if parameters is not None else None
+    return WorkflowInspectResponse(
+        version=request.version,
+        revision=revision_data,
+        configuration=configuration,
+        meta=request.meta,
+    )
+
+
 async def handle_inspect_success(
     request: Optional[WorkflowInvokeRequest],
 ):
     if request:
-        return JSONResponse(request.model_dump(mode="json", exclude_none=True))
+        response = _to_inspect_response(request)
+        return JSONResponse(response.model_dump(mode="json", exclude_none=True))
 
     return JSONResponse({"details": {"message": "Workflow not found"}}, status_code=404)
 
@@ -544,7 +573,7 @@ def _add_agent_routes(target: Any, prefix: str) -> None:
                 self.path + "/inspect",
                 inspect_endpoint,
                 methods=["POST"],
-                response_model=WorkflowInvokeRequest,
+                response_model=WorkflowInspectResponse,
             )
             if agent_enabled:
                 _add_agent_routes(self.router_fallback, self.path)
@@ -568,7 +597,7 @@ def _add_agent_routes(target: Any, prefix: str) -> None:
                 "/inspect",
                 inspect_endpoint,
                 methods=["POST"],
-                response_model=WorkflowInvokeRequest,
+                response_model=WorkflowInspectResponse,
             )
             if agent_enabled:
                 _add_agent_routes(self.mount_root, "")
@@ -587,7 +616,7 @@ def _add_agent_routes(target: Any, prefix: str) -> None:
             "/inspect",
             inspect_endpoint,
             methods=["POST"],
-            response_model=WorkflowInvokeRequest,
+            response_model=WorkflowInspectResponse,
         )
         if agent_enabled:
             _add_agent_routes(sub_app, "")
diff --git a/sdks/python/agenta/sdk/models/workflows.py b/sdks/python/agenta/sdk/models/workflows.py
index 0779695a8e..18c212dd53 100644
--- a/sdks/python/agenta/sdk/models/workflows.py
+++ b/sdks/python/agenta/sdk/models/workflows.py
@@ -307,6 +307,37 @@ def _coerce_nested_models(cls, values: Dict[str, Any]) -> Dict[str, Any]:
 WorkflowServiceInspectRequest = WorkflowInspectRequest
 
 
+class WorkflowInspectResponse(Metadata):
+    """The canonical ``/inspect`` response: the resolved interface, flat and self-describing.
+
+    ``/inspect`` is a public edge — it tells the browser which form to render and which inputs,
+    parameters, and outputs a workflow has. The response used to be a ``WorkflowInvokeRequest``
+    (a REQUEST model carrying response semantics), which nested the schemas under
+    ``data.revision.data.schemas`` and made every client guess the envelope. This model is the
+    explicit response contract instead:
+
+    - ``revision`` is a :class:`WorkflowRevisionData` directly (it already owns ``uri`` / ``url``
+      / ``headers`` / ``schemas`` / ``parameters``), so the schemas live at the obvious
+      ``response.revision.schemas`` — no ``data.revision.data`` nesting.
+    - ``configuration`` and ``meta`` carry the resolved config and any interface metadata (the
+      agent workflow rides its per-harness connection capability in ``meta``).
+
+    Typed outputs (POC, no back-compat field): ``revision.schemas.outputs`` may be keyed per
+    output type (for example ``{"messages": {...}, "invoke": {...}}``) so a workflow with more
+    than one output surface describes each one. A single-output workflow still uses a plain
+    output schema. Consumers read the keyed shape when present and fall back to the plain one.
+    """
+
+    version: Optional[str] = "2025.07.14"
+
+    revision: Optional[WorkflowRevisionData] = None
+    configuration: Optional[Dict[str, Any]] = None
+
+
+# back-compat alias
+WorkflowServiceInspectResponse = WorkflowInspectResponse
+
+
 class WorkflowBaseResponse(TraceID, SpanID):
     version: Optional[str] = "2025.07.14"
 
diff --git a/sdks/python/agenta/sdk/utils/types.py b/sdks/python/agenta/sdk/utils/types.py
index d2536917de..70b89613a3 100644
--- a/sdks/python/agenta/sdk/utils/types.py
+++ b/sdks/python/agenta/sdk/utils/types.py
@@ -11,6 +11,7 @@
 from agenta.sdk.agents.dtos import HARNESS_IDENTITIES, SandboxPermission
 from agenta.sdk.agents.mcp import MCPServerConfig
 from agenta.sdk.agents.tools import ToolConfig
+from agenta.sdk.agents.wire_models import run_contract_schemas
 from agenta.sdk.utils.assets import supported_llm_models, model_metadata
 from agenta.sdk.utils.helpers import _PLACEHOLDER_RE
 from agenta.sdk.utils.rendering import (
@@ -1360,4 +1361,9 @@ class _SkillEmbedRefSchema(BaseModel):
     SkillConfigSchema.ag_type(): _dereference_schema(
         SkillConfigSchema.model_json_schema()
     ),
+    # The `/run` wire contract (request + result), exported from the dedicated Pydantic wire
+    # models in `agenta.sdk.agents.wire_models`. This puts the service<->runner wire interface in
+    # the SDK the same way the other catalog types are exposed; a freshness test asserts these
+    # entries match a fresh export so the schema cannot drift from the models.
+    **run_contract_schemas(),
 }
diff --git a/sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py b/sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py
new file mode 100644
index 0000000000..095c509586
--- /dev/null
+++ b/sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py
@@ -0,0 +1,121 @@
+"""The ``/run`` wire models are the single schema source of truth.
+
+These tests prove the dedicated Pydantic wire models in ``agenta.sdk.agents.wire_models``
+faithfully describe the wire that ``request_to_wire`` / ``result_from_wire`` (in ``utils/wire.py``)
+produce and parse, and that the exported JSON Schema is the one shipped in the SDK through
+``CATALOG_TYPES``.
+
+This is the Python-side guard the project plan calls for:
+
+- The exported schema is committed into the SDK (``CATALOG_TYPES['run_request' | 'run_result']``)
+  and a freshness test asserts the catalog entry equals a fresh export, so the schema cannot
+  silently drift from the models.
+- The golden fixtures (the cross-language anchor) validate against the exported schema — an
+  "example must validate" check that proves the schema describes today's wire.
+- ``request_to_wire`` output validates against the schema, so the producer and the schema agree.
+- The request schema's property set equals the hand-kept ``KNOWN_REQUEST_KEYS`` in
+  ``test_wire_contract.py``, so a new wire field cannot land in one place and not the other.
+
+There is NO runtime validation in this phase (per the project plan): nothing here gates a live
+``/run``. These models are an SDK-side schema artifact and a test guard only.
+"""
+
+from __future__ import annotations
+
+import jsonschema
+import pytest
+
+from agenta.sdk.agents.wire_models import (
+    WireRunRequest,
+    WireRunResult,
+    run_contract_schemas,
+)
+from agenta.sdk.utils.types import CATALOG_TYPES
+
+from .test_wire_contract import (
+    KNOWN_REQUEST_KEYS,
+    _agenta_payload,
+    _claude_payload,
+    _pi_payload,
+)
+
+
+def test_run_contract_ships_in_the_sdk_catalog():
+    # The exported JSON interface lives in the SDK alongside the other catalog types, so a
+    # client / the playground / `/inspect` can resolve it the same way as `agent_config`.
+    assert "run_request" in CATALOG_TYPES
+    assert "run_result" in CATALOG_TYPES
+    assert CATALOG_TYPES["run_request"]["x-ag-type"] == "run_request"
+    assert CATALOG_TYPES["run_result"]["x-ag-type"] == "run_result"
+
+
+def test_committed_catalog_matches_a_fresh_export():
+    # Freshness: regenerate the schema in-memory and assert the committed catalog entry equals
+    # it (drift -> fail), the same discipline the goldens already use. If the wire models change,
+    # this fails until the export is regenerated.
+    fresh = run_contract_schemas()
+    assert CATALOG_TYPES["run_request"] == fresh["run_request"]
+    assert CATALOG_TYPES["run_result"] == fresh["run_result"]
+
+
+def test_exported_schema_is_dereferenced_and_camelcase():
+    # The exported schema is self-contained (no `$defs`/`$ref`, like every catalog entry) and
+    # speaks the camelCase wire keys, not the snake_case Python field names.
+    req = CATALOG_TYPES["run_request"]
+    assert "$defs" not in req
+    props = req["properties"]
+    assert "sessionId" in props and "session_id" not in props
+    assert "customTools" in props and "custom_tools" not in props
+
+
+def test_request_schema_properties_equal_known_request_keys():
+    # The schema-derived property set is exactly the hand-kept guard in `test_wire_contract.py`.
+    # This is the schema-derived key guard: a new wire field cannot be added to one without the
+    # other, so the two cannot silently fall out of step.
+    assert set(CATALOG_TYPES["run_request"]["properties"]) == KNOWN_REQUEST_KEYS
+
+
+@pytest.mark.parametrize(
+    "golden_name, model",
+    [
+        ("run_request.pi_core.json", WireRunRequest),
+        ("run_request.claude.json", WireRunRequest),
+        ("run_result.ok.json", WireRunResult),
+        ("run_result.error.json", WireRunResult),
+    ],
+)
+def test_goldens_parse_into_the_wire_models(golden, golden_name, model):
+    # Every golden parses cleanly into its wire model (by camelCase alias), proving the models
+    # accept the real wire. The ok-result golden includes a deliberately typeless event; the
+    # open `WireAgentEvent` (type optional) tolerates it, mirroring the parser's drop behavior.
+    model.model_validate(golden(golden_name))
+
+
+@pytest.mark.parametrize(
+    "golden_name, ag_type",
+    [
+        ("run_request.pi_core.json", "run_request"),
+        ("run_request.claude.json", "run_request"),
+        ("run_result.ok.json", "run_result"),
+        ("run_result.error.json", "run_result"),
+    ],
+)
+def test_goldens_validate_against_the_exported_schema(golden, golden_name, ag_type):
+    # "Examples must validate": each golden validates against the exported JSON Schema shipped in
+    # the SDK. This proves the schema describes today's wire. It is a TEST, not a runtime guard.
+    jsonschema.validate(golden(golden_name), CATALOG_TYPES[ag_type])
+
+
+def test_request_to_wire_output_validates_against_the_schema():
+    # The producer and the schema agree: the dict `request_to_wire` builds for each harness
+    # validates against the exported request schema and round-trips through the wire model.
+    for payload in (_pi_payload(), _claude_payload(), _agenta_payload()):
+        jsonschema.validate(payload, CATALOG_TYPES["run_request"])
+        WireRunRequest.model_validate(payload)
+
+
+def test_minimal_result_validates():
+    # A bare success result (the `result_from_wire` minimal case) is valid against the schema.
+    payload = {"ok": True}
+    jsonschema.validate(payload, CATALOG_TYPES["run_result"])
+    assert WireRunResult.model_validate(payload).ok is True
diff --git a/sdks/python/oss/tests/pytest/unit/test_inspect_response.py b/sdks/python/oss/tests/pytest/unit/test_inspect_response.py
new file mode 100644
index 0000000000..b4fdff8766
--- /dev/null
+++ b/sdks/python/oss/tests/pytest/unit/test_inspect_response.py
@@ -0,0 +1,106 @@
+"""The ``/inspect`` response is the canonical :class:`WorkflowInspectResponse`.
+
+Architecture-followups issue 1: ``/inspect`` used to return a ``WorkflowInvokeRequest`` (a
+REQUEST model carrying response semantics), nesting the resolved interface at
+``data.revision.data.schemas`` so every client had to guess the envelope. ``handle_inspect_success``
+now normalizes that internally-built request into a flat :class:`WorkflowInspectResponse` whose
+``revision`` IS the :class:`WorkflowRevisionData`, so schemas live at the obvious
+``response["revision"]["schemas"]``.
+
+These are the acceptance criteria from
+``docs/design/agent-workflows/interfaces/architecture-followups.md`` issue 1:
+
+- The response exposes schemas at ``response["revision"]["schemas"]`` (not ``data.revision.data``).
+- The frontend can resolve schemas from the new shape.
+"""
+
+from __future__ import annotations
+
+import json
+
+from agenta.sdk.decorators.routing import _to_inspect_response
+from agenta.sdk.models.workflows import (
+    WorkflowInspectResponse,
+    WorkflowInvokeRequest,
+    WorkflowRequestData,
+    WorkflowRevision,
+    WorkflowRevisionData,
+)
+
+_RESOLVED_REVISION = WorkflowRevisionData(
+    uri="agenta:builtin:agent:v0",
+    schemas={
+        "inputs": {"type": "object", "properties": {"messages": {"type": "array"}}},
+        "parameters": {"type": "object"},
+        # Typed outputs keyed per output surface (issue 4): the POC shape, no flat field.
+        "outputs": {
+            "invoke": {"x-ag-type-ref": "message", "type": "object"},
+            "messages": {"x-ag-type-ref": "messages", "type": "array"},
+        },
+    },
+    parameters={"agent": {"model": "gpt-5.5"}},
+)
+
+
+def _built_invoke_request() -> WorkflowInvokeRequest:
+    """The internally-built inspect result (what ``workflow.inspect()`` returns today)."""
+    return WorkflowInvokeRequest(
+        meta={"harness_capabilities": {"pi_core": {}}},
+        data=WorkflowRequestData(
+            revision=WorkflowRevision(
+                id=None,
+                slug="agent",
+                version="v0",
+                name="Agent",
+                data=_RESOLVED_REVISION,
+            ).model_dump(mode="json", exclude_none=True),
+        ),
+    )
+
+
+def test_inspect_response_lifts_revision_to_top_level():
+    response = _to_inspect_response(_built_invoke_request())
+
+    assert isinstance(response, WorkflowInspectResponse)
+    assert response.revision is not None
+    # Schemas live at response.revision.schemas — not nested under data.revision.data.
+    assert response.revision.schemas is not None
+    assert response.revision.schemas.inputs == _RESOLVED_REVISION.schemas.inputs
+    assert response.revision.uri == "agenta:builtin:agent:v0"
+    assert response.revision.parameters == {"agent": {"model": "gpt-5.5"}}
+    # Resolved config is preserved at the public boundary, not dropped.
+    assert response.configuration == {"parameters": {"agent": {"model": "gpt-5.5"}}}
+    # Interface metadata rides top-level meta.
+    assert response.meta == {"harness_capabilities": {"pi_core": {}}}
+
+
+def test_inspect_response_serializes_schemas_at_revision_schemas():
+    # The acceptance criterion in the words of a client: post /inspect, read response body,
+    # find schemas at body["revision"]["schemas"]. This is the exact path the frontend reads.
+    response = _to_inspect_response(_built_invoke_request())
+    body = json.loads(response.model_dump_json(exclude_none=True))
+
+    assert "revision" in body
+    assert "schemas" in body["revision"]
+    assert "inputs" in body["revision"]["schemas"]
+    # No request-envelope leakage: there is no top-level `data.revision.data` nesting.
+    assert "data" not in body
+
+
+def test_inspect_response_outputs_are_keyed_per_surface():
+    # Issue 4: outputs carry the typed shape keyed per output surface (messages / invoke).
+    response = _to_inspect_response(_built_invoke_request())
+    outputs = response.revision.schemas.outputs
+
+    assert set(outputs) == {"invoke", "messages"}
+    assert outputs["invoke"]["x-ag-type-ref"] == "message"
+    assert outputs["messages"]["x-ag-type-ref"] == "messages"
+
+
+def test_inspect_response_handles_a_request_with_no_revision():
+    # A built request with no resolved revision normalizes to an empty-revision response, not a
+    # crash (the inspect path can resolve nothing for an unknown URI).
+    response = _to_inspect_response(WorkflowInvokeRequest())
+    assert isinstance(response, WorkflowInspectResponse)
+    assert response.revision is None
+    assert response.configuration is None
diff --git a/services/oss/src/agent/schemas.py b/services/oss/src/agent/schemas.py
index 5a49a38a93..ccd2bb1f7b 100644
--- a/services/oss/src/agent/schemas.py
+++ b/services/oss/src/agent/schemas.py
@@ -67,12 +67,28 @@
     "properties": {"agent": AGENT_CONFIG_SCHEMA},
 }
 
-# Outputs: the final assistant message.
+# Outputs, keyed per output surface (the agent has two): `invoke` returns the single final
+# assistant message (the batch `/invoke` shape, `x-ag-type-ref: message`); `messages` returns the
+# ordered conversation the `/messages` route streams (`x-ag-type-ref: messages`). Keying outputs by
+# surface lets the playground render the right output view per route. POC, so no flat back-compat
+# output field: a consumer reads the keyed shape directly. Both refs already appear elsewhere in
+# AGENT_SCHEMAS, so this adds no new catalog marker.
 AGENT_OUTPUTS_SCHEMA = {
     "$schema": _SCHEMA,
-    "x-ag-type-ref": "message",
     "type": "object",
-    "description": "Final assistant message returned by the agent.",
+    "description": "Agent outputs, keyed per output surface (invoke / messages).",
+    "properties": {
+        "invoke": {
+            "x-ag-type-ref": "message",
+            "type": "object",
+            "description": "Final assistant message returned by a batch /invoke.",
+        },
+        "messages": {
+            "x-ag-type-ref": "messages",
+            "type": "array",
+            "description": "The ordered conversation the /messages route returns.",
+        },
+    },
 }
 
 AGENT_SCHEMAS = {
diff --git a/web/packages/agenta-entities/src/workflow/api/api.ts b/web/packages/agenta-entities/src/workflow/api/api.ts
index 44ee390dae..c8b5a4be47 100644
--- a/web/packages/agenta-entities/src/workflow/api/api.ts
+++ b/web/packages/agenta-entities/src/workflow/api/api.ts
@@ -415,12 +415,19 @@ export async function fetchWorkflowRevisionById(
 // ============================================================================
 
 /**
- * Response shape from the inspect endpoint.
- * Returns a WorkflowServiceRequest with resolved interface.
+ * Response shape from the `/inspect` endpoint.
+ *
+ * The canonical backend model is `WorkflowInspectResponse`
+ * (sdks/python/agenta/sdk/models/workflows.py): a flat response whose `revision` IS the
+ * resolved `WorkflowRevisionData`, so schemas live at `revision.schemas`. The endpoint no
+ * longer returns the old `WorkflowInvokeRequest` envelope that nested them under
+ * `data.revision.data.schemas`.
+ *
+ * `outputs` is typed per output surface (POC): `{invoke, messages}` for the agent workflow,
+ * or a single schema for a one-output workflow. The store reads either shape.
  */
 export interface InspectWorkflowResponse {
     version?: string
-    /** New shape (feat/extend-runnables): revision contains the resolved data */
     revision?: {
         uri?: string
         url?: string
@@ -432,7 +439,14 @@ export interface InspectWorkflowResponse {
         }
         parameters?: Record<string, unknown>
     }
-    /** @deprecated Old shape — kept for backward compat during migration */
+    configuration?: Record<string, unknown>
+    meta?: Record<string, unknown>
+    /**
+     * @deprecated Migration bridge for the old `WorkflowInvokeRequest` inspect envelope. The
+     * canonical response puts schemas at `revision.schemas`; read that first. Remove this once
+     * every reader (appUtils / evaluatorUtils) no longer needs the `?? interface?.schemas`
+     * fallback — i.e. once no deployed service returns the old envelope.
+     */
     interface?: {
         version?: string
         uri?: string
@@ -444,10 +458,6 @@ export interface InspectWorkflowResponse {
             outputs?: Record<string, unknown>
         }
     }
-    configuration?: {
-        script?: Record<string, unknown>
-        parameters?: Record<string, unknown>
-    }
 }
 
 /**
diff --git a/web/packages/agenta-entities/src/workflow/state/store.ts b/web/packages/agenta-entities/src/workflow/state/store.ts
index f7fc3d344b..83531a1cb8 100644
--- a/web/packages/agenta-entities/src/workflow/state/store.ts
+++ b/web/packages/agenta-entities/src/workflow/state/store.ts
@@ -1535,11 +1535,13 @@ export const workflowEntityAtomFamily = atomFamily((workflowId: string) =>
         let resolvedParams: Record<string, unknown> | null | undefined = null
 
         // (a) Inspect — primary source for any workflow with a URI.
-        // Returns interface.schemas.{inputs, parameters, outputs} directly.
+        // The canonical `WorkflowInspectResponse` puts the resolved interface at
+        // `revision.schemas.{inputs, parameters, outputs}` (revision IS the WorkflowRevisionData),
+        // so we read it directly. `outputs` may be typed per output surface ({invoke, messages}).
         const inspectQuery = get(workflowInspectAtomFamily(workflowId))
         const inspectData = inspectQuery.data ?? null
         if (inspectData) {
-            const inspectSchemas = inspectData.revision?.schemas ?? inspectData.interface?.schemas
+            const inspectSchemas = inspectData.revision?.schemas
             if (inspectSchemas) {
                 resolvedInputs = inspectSchemas.inputs
                 resolvedOutputs = inspectSchemas.outputs
diff --git a/web/packages/agenta-entities/tests/unit/inspectResponseSchemaResolution.test.ts b/web/packages/agenta-entities/tests/unit/inspectResponseSchemaResolution.test.ts
new file mode 100644
index 0000000000..411676de6c
--- /dev/null
+++ b/web/packages/agenta-entities/tests/unit/inspectResponseSchemaResolution.test.ts
@@ -0,0 +1,83 @@
+/**
+ * Schema resolution from the canonical `/inspect` response shape.
+ *
+ * Architecture-followups issue 1: `/inspect` now returns the canonical `WorkflowInspectResponse`
+ * (sdks/python/agenta/sdk/models/workflows.py) whose `revision` IS the resolved
+ * `WorkflowRevisionData`, so schemas live at `revision.schemas`. The store read
+ * (`web/packages/agenta-entities/src/workflow/state/store.ts`, the inspect branch) reads exactly
+ * that path. These tests pin that read against the real response shape so it cannot silently
+ * regress to resolving `undefined` (the latent break this fix closes).
+ *
+ * The store's read is an inline expression over the query data, not an exported function, so we
+ * reproduce that exact expression here over a typed `InspectWorkflowResponse`. The point is the
+ * CONTRACT: the canonical body resolves schemas; the old nested envelope does not.
+ */
+
+import {describe, expect, it} from "vitest"
+
+import type {InspectWorkflowResponse} from "../../src/workflow/api/api"
+
+// The exact read the store performs in its inspect branch (store.ts).
+function resolveInspectSchemas(inspectData: InspectWorkflowResponse | null) {
+    if (!inspectData) return null
+    const inspectSchemas = inspectData.revision?.schemas
+    if (!inspectSchemas) return null
+    return {
+        inputs: inspectSchemas.inputs,
+        outputs: inspectSchemas.outputs,
+        parameters: inspectSchemas.parameters,
+    }
+}
+
+describe("inspect response schema resolution", () => {
+    it("resolves schemas from the canonical revision.schemas shape", () => {
+        const body: InspectWorkflowResponse = {
+            version: "2025.07.14",
+            revision: {
+                uri: "agenta:builtin:agent:v0",
+                schemas: {
+                    inputs: {type: "object", properties: {messages: {type: "array"}}},
+                    parameters: {type: "object"},
+                    outputs: {
+                        invoke: {"x-ag-type-ref": "message", type: "object"},
+                        messages: {"x-ag-type-ref": "messages", type: "array"},
+                    },
+                },
+                parameters: {agent: {model: "gpt-5.5"}},
+            },
+            meta: {harness_capabilities: {}},
+        }
+
+        const resolved = resolveInspectSchemas(body)
+        expect(resolved).not.toBeNull()
+        expect(resolved?.inputs).toEqual({
+            type: "object",
+            properties: {messages: {type: "array"}},
+        })
+        expect(resolved?.parameters).toEqual({type: "object"})
+    })
+
+    it("exposes outputs keyed per output surface (invoke / messages)", () => {
+        const body: InspectWorkflowResponse = {
+            revision: {
+                schemas: {
+                    outputs: {
+                        invoke: {"x-ag-type-ref": "message"},
+                        messages: {"x-ag-type-ref": "messages"},
+                    },
+                },
+            },
+        }
+
+        const resolved = resolveInspectSchemas(body)
+        const outputs = resolved?.outputs as Record<string, Record<string, unknown>> | undefined
+        expect(outputs && Object.keys(outputs).sort()).toEqual(["invoke", "messages"])
+        expect(outputs?.invoke["x-ag-type-ref"]).toBe("message")
+        expect(outputs?.messages["x-ag-type-ref"]).toBe("messages")
+    })
+
+    it("resolves nothing when there is no revision (no crash, no stale schemas)", () => {
+        expect(resolveInspectSchemas({})).toBeNull()
+        expect(resolveInspectSchemas(null)).toBeNull()
+    })
+})