diff --git a/docs/design/agent-workflows/projects/wire-contract-schema/README.md b/docs/design/agent-workflows/projects/wire-contract-schema/README.md new file mode 100644 index 0000000000..262ed168eb --- /dev/null +++ b/docs/design/agent-workflows/projects/wire-contract-schema/README.md @@ -0,0 +1,589 @@ +# Project: A schema-driven `/run` contract + +| | | +| --- | --- | +| **Status** | Plan. Revised per author PR review on #4830 (2026-06-24). Pre-production POC — any wire shape may change freely; no back-compat burden. | +| **Type** | Engineering project (a sequenced, test-driven change), not a one-shot change. | +| **Scope** | Replace the hand-mirrored `/run` wire contract with a single schema source (Pydantic for now); **ship the exported JSON interface in the SDK** and investigate whether Fern can see it; fold in a structured error model and a carried contract version. **No sidecar/runner validation yet** — the contract is still brittle. | +| **Owner files (today)** | `services/agent/src/protocol.ts` (TS types), `sdks/python/agenta/sdk/agents/utils/wire.py` (Python mirror), `sdks/python/oss/tests/pytest/unit/agents/golden/` (fixtures), `sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py` + `services/agent/tests/unit/wire-contract.test.ts` (the two contract tests). | +| **Reference** | The deep spec of the contract as built: [`../runner-interface/README.md`](../runner-interface/README.md). Its Section 12 ("Known gaps") names the exact gaps this project closes. The inventory page: [`../../interfaces/cross-service/service-to-agent-runner.md`](../../interfaces/cross-service/service-to-agent-runner.md). | +| **Mirroring rule today** | `services/agent/CLAUDE.md` ("The wire contract is mirrored — change both sides"). | + +## 1. The problem, precisely + +The `/run` contract is the spine of the agent stack: the Python agent service builds a request, +the Node runner executes a turn, and returns a result or a stream of events. The contract is +**defined twice** and kept in sync **by hand**: + +- TypeScript: `services/agent/src/protocol.ts` declares `AgentRunRequest`, `AgentRunResult`, + the `AgentEvent` union, `HarnessCapabilities`, and the sub-objects (`ResolvedToolSpec`, + `ToolCallbackContext`, `McpServerConfig`, `SandboxPermission`, `TraceContext`, `WireSkill`, + `ContentBlock`, `ChatMessage`, `AgentUsage`, `RenderHint`, `StreamRecord`). +- Python: `sdks/python/agenta/sdk/agents/utils/wire.py` (`request_to_wire` / `result_from_wire`) + plus the BaseModels in `sdks/python/agenta/sdk/agents/dtos.py` (`Message`, `AgentEvent`, + `AgentResult`, `HarnessCapabilities`, `TraceContext`, `SandboxPermission`, ...) re-create the + same field names by hand. + +The **only** guard against the two drifting is four golden fixtures +(`golden/run_request.{pi,claude}.json`, `golden/run_result.{ok,error}.json`) asserted by two +tests. The TS test adds a compile-time key guard (`KNOWN_REQUEST_KEYS` assigned to +`(keyof AgentRunRequest)[]`), and the Python test holds a parallel `KNOWN_REQUEST_KEYS` set. + +This is brittle for concrete, observed reasons: + +1. **Two hand-kept key lists.** `KNOWN_REQUEST_KEYS` is duplicated in + `test_wire_contract.py` and `wire-contract.test.ts`. A new field means editing five places + (golden, `protocol.ts`, `wire.py`, both key lists) "deliberately", per the CLAUDE.md rule. +2. **No runtime validation at the boundary.** `POST /run` JSON-parses the body and runs with + whatever fields are present; an empty body becomes `{}` (`server.ts`). A malformed or + misspelled field is silently ignored, not rejected. The contract is *implicitly all-optional* + (every TS field is `?`, every Python field defaults). A typo like `sandboxPermision` is + dropped on the floor with no error. This is `runner-interface/README.md` §12 gap + "No schema validation on the runner". (Observed gap — but **not** fixed in this POC phase; a + boundary guard is a deferred follow-up, Section 8.) +3. **The version skew guard is exposed but unconsumed.** `version.ts` exports + `PROTOCOL_VERSION = 1` and `/health` returns it, but **no Python caller probes `/health`** + (verified: no reference to `runnerInfo`/`PROTOCOL_VERSION`/`/health` in the runner-calling + path). A client and runner can silently disagree across a major bump. §12 gap "The version + skew guard is not consumed". +4. **The error model is a free string.** `AgentRunResult.error?: string` with no taxonomy and + no machine-readable code; `result_from_wire` turns any `ok:false` into a generic + `RuntimeError(f"Agent run failed: {error}")`. There is **no distinct cancelled outcome** — + a user/client abort surfaces (if at all) as a transport teardown or a generic error, not as a + first-class result. §12 names neither, but the user has scoped this as the A10 cleanup. + +The fix for now is a **single source of truth** (Pydantic wire models) whose **JSON interface +ships in the SDK**, plus the A10 error model and a `/capabilities` probe — sequenced so each step +is a small change with a test that proves it. Boundary validation, generated TS types, and +versioning are **deferred** (the contract is still brittle; this is a pre-production POC). The +A3 rename (backend removal + `pi`->`pi_core` / `agenta`->`pi_agenta`) has already landed in the +working tree, so the wire models describe that current shape from the start. + +## 2. What this project changes vs leaves alone + +**In scope:** + +- One schema as the source of truth for the `/run` request, result, event union, capabilities, + and the sub-objects listed above. **Source = Pydantic for now** (Section 4). +- **The exported JSON Schema interface lives in the SDK** (alongside the existing `CATALOG_TYPES` + JSON interfaces), and an investigation of whether Fern can see/generate it across languages + (Section 4). +- A structured error object `{ code, message, retryable }` and a distinct `cancelled` outcome. +- A contract version carried in the payload (not only on `/health`), and a probe that consumes + it. +- A decision on splitting `/run` (verdict: keep `/run` unified; promote a `/capabilities` probe). +- Replacing the four golden fixtures + two key lists with schema-derived checks **on the Python + side** (the golden fixtures stay as *examples that must validate*, not as the only guard). + +**Deliberately NOT in scope for now (the contract is still brittle):** + +- **No request validation in the runner** (`server.ts` / `cli.ts`). We do not gate `/run` on the + schema yet. The runner keeps parsing the body as it does today. +- **No use of the schema in the sidecar/runner at all.** No ajv, no new runner dependency, no + runtime validation step on the Node side. The schema is an SDK-side artifact for now. +- These are deferred until the contract stabilizes; revisit when we want a hard boundary guard. + +**Explicitly unchanged by this work** (called out so reviewers do not expect movement): + +- **Composio, the tool gateway, connections, and MCP** all continue to work as today. They are + inputs the service already resolves: `customTools` (gateway callback / code / client), + `toolCallback`, `mcpServers`, `connection` / `provider` / `endpoint` / `credentialMode`. This + project re-expresses their **shapes** in a schema; it does not change how any of them resolve, + route, or authenticate. The Composio key stays server-side; the gateway callback still POSTs to + `/tools/call`; the MCP `stdio`/`http` shapes are unchanged. (We may still adjust any of these + shapes if the schema work surfaces a better one — this is a POC, not a frozen contract.) +- The transports (HTTP + subprocess CLI) and the two modes (one-shot JSON + NDJSON streaming). +- The harness shaping logic (`config.wire_tools()` etc.) — the schema describes the *output* + of that shaping, it does not move the shaping. +- Tracing (`trace` / `TraceContext`) and the trace-export boundary. + +## 3. Current contract surface (assessment) + +The contract has four families. This is what a single schema has to cover. + +### 3.1 Request (`AgentRunRequest`) + +~30 top-level fields, **all optional on the wire**, grouped by job: + +| Group | Fields | +| --- | --- | +| engine + placement | `backend`, `harness`, `sandbox`, `sessionId` | +| instructions | `agentsMd`, `systemPrompt`, `appendSystemPrompt` | +| model + connection | `model`, `provider`, `connection {mode, slug?}`, `deployment`, `endpoint {baseUrl?, apiVersion?, region?, headers?}`, `credentialMode`, `secrets` | +| turn | `prompt`, `messages` (`ChatMessage[]`) | +| tools + skills | `tools` (string[]), `customTools` (`ResolvedToolSpec[]`), `toolCallback`, `mcpServers`, `skills` (`WireSkill[]`) | +| policy + files | `permissionPolicy`, `sandboxPermission`, `harnessFiles` (`[{path, content}]`) | +| tracing | `trace` (`TraceContext`) | + +Shape notes (the current serializer behavior, **not** a back-compat constraint — this is a +pre-production POC and any of these may change freely): a plain-string `model` keeps `provider` / +`connection` / `deployment` / `endpoint` / `credentialMode` off the wire; `mcpServers`, `skills`, +`sandboxPermission`, `harnessFiles` are omitted (not null) when empty. The schema describes +whatever shape we settle on; it does not exist to freeze today's bytes. + +### 3.2 Result (`AgentRunResult`) + +`ok` (bool), `output?`, `messages?`, `events?`, `usage?` (`AgentUsage`), `stopReason?`, +`capabilities?` (`HarnessCapabilities`), `sessionId?`, `model?`, `traceId?`, `error?` (the free +string this project replaces). `ok:false` raises in Python (`result_from_wire`). + +### 3.3 The event union (`AgentEvent`) + +A discriminated union on `type`: `message`, `thought`, the `message_*` / `reasoning_*` lifecycle +trios, `tool_call`, `tool_result`, `interaction_request`, `data`, `file`, `usage`, `error`, +`done`. Plus `StreamRecord = {kind:"event",event} | {kind:"result",result}` for NDJSON framing. +Note: the Python side intentionally **drops unknown event types** on parse +(`AgentEvent.from_wire` returns `None` for a typeless event), and a golden pins that. The schema +must keep events **open/forward-compatible**, not closed. + +### 3.4 Sub-objects + +`ResolvedToolSpec` (the three-axis tool surface: `kind`/`runtime`/`code`/`env`/`callRef`, +`needsApproval`, `render`, `readOnly`, `permission`), `ToolCallbackContext`, `McpServerConfig`, +`SandboxPermission` (nested `network`, `filesystem`, `enforcement`), `HarnessCapabilities` (11 +boolean flags), `TraceContext`, `WireSkill` + `WireSkillFile`, `ContentBlock`, `ChatMessage`, +`AgentUsage`, `RenderHint`. + +### 3.5 The existing golden/test machinery + +- `golden/run_request.pi.json` (full Pi shape: tools, skills, sandboxPermission, prompt overrides), + `golden/run_request.claude.json` (Claude shape: empty `tools`, `permissionPolicy:"deny"`, + `harnessFiles` with rendered `.claude/settings.json`). +- `golden/run_result.ok.json` (includes a typeless event to pin the drop behavior), + `golden/run_result.error.json` (`{"ok": false, "error": "model exploded"}`). +- Python `test_wire_contract.py`: builds payloads via the real configs and asserts `== golden`, + plus `set(payload) <= KNOWN_REQUEST_KEYS`. +- TS `wire-contract.test.ts`: loads the goldens, asserts shapes through the runner helpers + (`resolvePromptText`, `messageText`, `resolveRunSessionId`), and the two compile-time guards + (`KNOWN_REQUEST_KEYS` / `CAPABILITY_KEYS` assigned to `keyof` types). + +The machinery is **good** and we keep its spirit: the goldens become "examples that must validate +against the schema", and the duplicated key lists are replaced by schema-derived assertions. + +## 4. Design options for a single source of truth + +Three candidates, judged against this stack: **Python Pydantic 2 SDK** + a **standalone Node ESM +runner package** (`services/agent`) that has its own `pnpm-lock.yaml`, runs through `tsx` with +**no app compile step and no codegen toolchain today**, and is deliberately decoupled from the +`web/` dependency graph. There is no JSON-Schema codegen, no `quicktype`, +no `datamodel-code-generator`, and **no zod** anywhere in the runner or web (verified). + +### Option A — JSON Schema as source, codegen both sides + +Author the contract as hand-written JSON Schema files; generate TS types +(`json-schema-to-typescript`) and Pydantic models (`datamodel-code-generator`) from them. + +- **Pros:** language-neutral source; one artifact; both sides are generated, so neither drifts. +- **Cons:** introduces **two new codegen toolchains** into a repo that has none for this, and a + build step into a package that intentionally has none (`services/agent/CLAUDE.md`: "no app + compile step"). Hand-writing JSON Schema is verbose and error-prone for a union as rich as + `AgentEvent` + `RenderHint`. The Python SDK already has hand-written BaseModels with custom + `to_wire`/`from_wire` (camelCase aliasing, the `model`-string split, the drop-unknown-event + behavior); regenerating them from schema would either lose that behavior or require post-gen + patching. High blast radius, fights the existing grain. Also: it does not put the interface in + the SDK the way the existing `CATALOG_TYPES` Pydantic-derived schemas already are (Section 4.1). + +### Option B — Pydantic as source, export the JSON interface into the SDK (RECOMMENDED) + +Make **Python Pydantic models the source of truth** — but a **dedicated set of *wire* models**, +NOT the existing semantic DTOs. This distinction is load-bearing (it was the sharpest review +finding): the real contract today does not live in `dtos.py`'s classes — it lives in the **hand +serializers** (`request_to_wire` builds a raw dict; `Message.to_wire`, `TraceContext.to_wire`, +`AgentEvent.from_wire`, etc. do the camelCase + omit + drop-unknown work). The semantic DTOs use +**snake_case** fields (`text_messages`, `mime_type`, `capture_content`) and an intentionally +loose `AgentEvent` (`type: str` + free `data` dict, vs the real discriminated union in +`protocol.ts`). Exporting `model_json_schema()` straight off those DTOs would produce the *wrong* +schema (snake_case keys, a non-discriminated event). So: + +- Author new wire models in the SDK (e.g. `agents/wire_models.py`): `WireRunRequest`, + `WireRunResult`, and an **explicit discriminated `WireAgentEvent` union** (real variants on + `type`, plus an open fallback variant so unknown event types still validate, matching the + current drop-unknown tolerance) — with camelCase aliases (`populate_by_name=True`, as + `AgentConfig` already does), explicit nullability, and the exact field set the serializers emit. +- These wire models become the single producer: `request_to_wire` / `result_from_wire` are + reimplemented in terms of them. The omit-when-empty behavior stays as serializer logic + golden + checks — `model_json_schema()` expresses "optional", not "omit when empty". +- Pydantic 2's `model_json_schema()` exports the JSON Schema artifact **for free**, no new + toolchain. **This exported JSON interface ships in the SDK** — exactly the way the SDK already + exposes Pydantic-derived JSON Schemas through `CATALOG_TYPES` (Section 4.1). The immediate goal + is that the interface (the JSON) lives in the SDK; the runner does **not** consume it yet + (Section 5). + +- **Pros:** fits the stack — Pydantic 2 is already the SDK's modeling layer (`pydantic>=2,<3`); + the producer (Python) is the natural source since it builds the request. Schema export is a + built-in, not a new tool. It puts the interface in the SDK alongside the existing + `CATALOG_TYPES` JSON interfaces (one consistent mechanism). The omit-when-empty behavior stays + in Python where it already lives and is tested. The exported schema becomes a **CI-checked + artifact**: a test fails if the committed schema drifts from the wire models. +- **Cons:** requires writing dedicated wire models (a real cost, but it is the honest cost of a + single source — the alternative is the current double-maintenance). The TS `protocol.ts` stays + **hand-written for now** — we do *not* generate it from the schema yet, because the runner does + not consume the schema yet (Section 5) and the contract is still brittle. Keeping the schema and + `protocol.ts` aligned stays a Python-side discipline for the moment (the Python goldens are the + guard). Generating `protocol.ts` from the schema is a later option once the contract settles. + +#### 4.1 The interface in the SDK, and whether Fern can see it + +The author's direction: get this interface (the JSON Schema) **into the SDK** now, and find out +whether **Fern** can also see/generate it across languages. Findings, with concrete paths: + +- **The SDK already exposes Pydantic-derived JSON interfaces.** `CATALOG_TYPES` in + `sdks/python/agenta/sdk/utils/types.py` (line ~1265) is a dict of + `model_json_schema()` outputs for `Message`, `Messages`, `AgentConfigSchema`, + `SkillConfigSchema`, `PromptTemplate`, etc., each dereferenced. The agent workflow surfaces + them through `/inspect` via thin `x-ag-type-ref` markers (`services/oss/src/agent/schemas.py`), + and the playground resolves them against `GET /workflows/catalog/types/{type}`. **The wire + contract should ship the same way:** add the exported `WireRunRequest` / `WireRunResult` JSON + Schema next to `CATALOG_TYPES` (or as a sibling export) so the SDK is the single home of the + JSON interface. This is the immediate, low-risk goal. + +- **How Fern is used here.** Fern in this repo generates the multi-language API clients (Python + + TypeScript) under `clients/` and `web/packages/agenta-api-client/`. The pipeline + (`clients/scripts/generate.sh`) is: the FastAPI app (Pydantic models) emits **`/api/openapi.json`** + → the script writes an ephemeral `fern.config.json` + `generators.yml` and runs the + `fernapi/fern-python-sdk` and `fernapi/fern-typescript-sdk` generators against that OpenAPI + spec. There is **no `.fern/` API-definition directory checked in** and no Fern IDL; Fern's only + input is the generated OpenAPI document. So the chain is **Pydantic → OpenAPI → Fern → SDKs**. + +- **Can Fern see this interface? Yes, but only via OpenAPI — with one real caveat.** Fern reads + the OpenAPI spec, and that spec is built from the FastAPI/Pydantic models the *public API* + exposes. The `/run` contract is the **service ↔ runner spine**, not a public FastAPI endpoint, + so it does **not** appear in `openapi.json` today and Fern therefore cannot see it as-is. Two + ways to make Fern see it, neither needed for the immediate goal: + - **(a) Reference the wire models from a FastAPI surface.** If any endpoint (even an internal or + `/inspect`-style descriptor) types a field with the wire Pydantic models, FastAPI emits their + JSON Schema into `components/schemas` of `openapi.json`, and Fern then generates them in every + client language. This is the same path `AgentConfigSchema` already takes to reach the clients. + - **(b) Add a standalone OpenAPI fragment as a second Fern spec.** `generators.yml` takes a list + under `api.specs`; a hand-authored fragment that `$ref`s the exported `run-contract.schema.json` + could be added. Heavier and not worth it now. + - **Blocker / reason not to do it yet:** the contract is still brittle (it changes often as the + POC evolves), and putting it on the public OpenAPI surface would publish a moving target into + every generated client. So **for now**: export the JSON interface into the SDK (the + `CATALOG_TYPES`-style path), keep it out of the public OpenAPI spec, and let Fern pick it up + later once it stabilizes. The path is clear and there is no hard blocker — only a timing call. + +### Option C — A shared IDL (`.proto`, Smithy, etc.) + +Define the contract in a neutral IDL and generate both sides. + +- **Pros:** strongest neutrality; mature codegen. +- **Cons:** the heaviest option for an internal JSON-over-HTTP/stdio boundary. The wire is JSON, + not protobuf; adopting proto means either proto-over-JSON (awkward) or changing the wire format + (out of scope and risky). Brings a build toolchain and a new language into a two-language repo + that wants fewer moving parts. The `AgentEvent` open-union + "drop unknown" semantics fit JSON + Schema's `additionalProperties`/`oneOf` better than proto's closed messages. Overkill. + +### Recommendation: Option B (Pydantic-as-source → exported JSON interface in the SDK) + +Use **Pydantic as the source for now**. It fits the Pydantic 2 stack, keeps the custom +serialization semantics where they are tested, exports the JSON Schema for free, and **puts the +interface in the SDK** the same way `CATALOG_TYPES` already does — which is exactly the immediate +goal. Source of truth = dedicated Pydantic **wire** models (not the semantic DTOs); the exported +schema ships in the SDK as a CI-checked artifact (a test fails if it drifts from the wire models). + +Two deliberate constraints from the author's review: + +- **No runner/sidecar validation yet.** The runner does not load or validate against the schema; + there is no ajv, no new runner dependency, no build step. The contract is still brittle, so we + hold off on a hard boundary guard (Section 5). +- **`protocol.ts` stays hand-written for now.** We do not generate TS types from the schema yet + (that only pays off once the runner consumes the schema). The Python goldens remain the guard. + +Fern can reach this interface later through the existing **Pydantic → OpenAPI → Fern → SDKs** +pipeline once the contract stabilizes (Section 4.1); for now the interface lives in the SDK only. + +## 5. Validation — deferred (no runtime guard yet) + +Author's direction (PR review): **do not validate for the moment.** The contract is still +brittle, so this project does **not** add a runtime boundary guard on either side yet. + +- **No runner ingress validation.** `server.ts` / `cli.ts` keep parsing the `/run` body exactly + as today (empty body → defaults, unknown fields ignored). No ajv, no new runner dependency, no + schema loaded on the Node side. A present-but-malformed body is still tolerated for now. +- **No runtime Python validation either.** `request_to_wire` / `result_from_wire` are not gated + on the schema at runtime. + +What the schema *is* used for in this phase is **Python-side tests only**: the exported schema +validates the existing goldens (an example-must-validate check) and can validate `request_to_wire` +output in a unit test, so the schema is proven faithful without changing any production code path. +That is the full extent of validation for now. + +When the contract stabilizes, a real boundary guard (runner ingress validation + a symmetric +Python result check) is a natural follow-up — see Section 8 / Open questions. Until then it is +explicitly out of scope. + +## 6. The `/run` split decision + +The user agrees `/run` does too much. `/run` today conflates: (a) a one-shot turn, (b) a +streaming turn (same route, switched by `Accept`), and (c) there is no separate way to ask "what +can this runner do" except the unconsumed `/health`. Evaluated splits: + +### Keep as one endpoint: single-turn vs streaming + +**Do NOT split** one-shot and streaming into two endpoints. They share the identical +`AgentRunRequest` and return the identical `AgentRunResult` (the streaming terminal `result` +record is the same object with `events` emptied). The only difference is the `Accept` header +selecting the framing. The `runner-interface` RFC §6 calls this the "symmetry guarantee", and +both Python transports already parse both with the same `result_from_wire`. Splitting would +duplicate the request schema and the dispatch for no contract benefit. Content negotiation +(`Accept: application/x-ndjson`) is the right axis and is already in place. **Verdict: keep.** + +### Split out: a capability / contract probe + +**DO formalize the probe — the author endorsed this in review ("that's a good idea with +capabilities").** `/health` already returns `{status, runner, protocol, engines, harnesses}` but +nothing consumes it, and `HarnessCapabilities` (per-harness, 11 flags) is only discoverable by +doing a full run. Recommendation: + +- Keep `GET /health` as the cheap liveness + identity + **contract version** probe (it already + carries `protocol`). This is what the A1 version check consumes (Section 7). +- Add `GET /capabilities` (or `GET /capabilities?harness=pi_core`) that returns the static **base** + `HarnessCapabilities` per harness **without running a turn**. Today capabilities are probed + per-run and returned in the result; a static probe lets the service/playground render UI and + pre-validate a request (e.g. reject `images` for a harness that lacks `fileAttachments`) before + spending a run. The probe must state base-vs-effective explicitly: some flags are + mode-dependent (`streamingDeltas` is derived at run time in `engines/sandbox_agent.ts`), so the + static probe returns **base** capabilities and the run result stays authoritative for + mode-dependent flags. This is additive, not a split of `/run`'s job. + +**Verdict: keep `/run` unified for the turn; promote a `/capabilities` probe and actually consume +`/health`.** This removes work from the run path (capability discovery) without fragmenting the +turn contract. + +### Considered and rejected + +- A separate `/cancel` endpoint: rejected. Cancellation is correctly modeled as transport + teardown (close the NDJSON connection / kill the subprocess), already wired for + `runSandboxAgent` over HTTP. A `/cancel` would need session affinity the cold runtime does not + have. The A10 change adds a *cancelled outcome* (Section 7), not a cancel endpoint. +- A separate tool-callback or MCP endpoint on the runner: out of scope and unchanged — those are + the runner *calling out* (`/tools/call`) and the gateway/MCP surfaces, which this work does not + touch. + +## 7. Folding in the sibling projects (A1, A3, A10) + +This project assumes and coordinates with three parallel efforts. The schema is where they meet. + +### A3 — backend removal + harness rename (already landed in the working tree) + +A3 removed the legacy in-process backend and the `backend` field, and renamed harness values +`pi -> pi_core` and `agenta -> pi_agenta`. This is **no longer "assumed end state"** — it is +already in the working tree (`version.ts` now declares `HARNESSES = ["pi_core","claude", +"pi_agenta"]`; the pi golden is renamed `run_request.pi_core.json`; `engines/pi.ts` is deleted). +So the schema simply describes that current shape: + +- No `backend` field. +- `harness` is `pi_core` | `pi_agenta` | `claude`. + +Because this is a **pre-production POC, we do NOT version the pi/agenta rename.** There is no v1→v2 +cut for it, no downcaster, no `PROTOCOL_VERSION` bump tied to the rename — the wire just changes. +The wire models are authored against today's renamed shape from the start. + +### A1 — versioning (coordinate: a simple string version, the LLM-as-judge style) + +A1 is the sibling project [`../contract-versioning/`](../contract-versioning/) (it owns the +versioning strategy). Per the author's review, A1 is being simplified to **a plain string version +plus an if/else branch — the same pattern the codebase already uses elsewhere** (the +`x-ag-messages-version: "v1"` header and `VERCEL_MESSAGE_PROTOCOL_VERSION` string; the LLM-as-judge +string-version + if/else dispatch). **No `{major, minor}` struct, no `contractVersion` field name, +no upcaster/downcaster machinery.** This project defers to whatever simple string convention A1 +lands on and reuses it verbatim (do NOT invent a new scheme). + +It is still true that the runner advertises `protocol: 1` on `/health` (`version.ts`) but the +Python client (`ts_runner.py`) never reads it. If A1 wants the version carried on the payload, it +rides as the same simple string A1 chooses, stamped by the producer and branched on with a plain +if/else on the consumer. Skew handling and any negotiation are A1's call; this project only agrees +to carry the field A1 specifies in the wire models. Given the POC framing, even this is optional +for now. + +### A10 — error model cleanup (in scope here) + +Replace `AgentRunResult.error?: string` with a structured error and add a distinct cancelled +outcome: + +```jsonc +// AgentRunResult, error branch +{ + "ok": false, + "error": { + "code": "model_error", // taxonomy, see below + "message": "model exploded", // human-readable, what today's string held + "retryable": false // does a naive retry have a chance? + } +} +``` + +- **Error taxonomy (`code`)**, a closed-ish enum the runner sets and the service can branch on: + `unsupported_harness`, `auth_error`, `quota_exceeded`, `rate_limited`, `configuration_error`, + `permission_denied`, `model_error`, `tool_error`, `mcp_error`, `sandbox_error`, `timeout`, + `cancelled`, `internal`. The `auth_error` / `quota_exceeded` / `rate_limited` codes are not + speculative: the runner already pattern-classifies these from provider error text in + `services/agent/src/engines/sandbox_agent/errors.ts` — the schema just gives that classification + a stable wire code. Keep the enum forward-compatible (an unknown code -> treat as `internal`), + mirroring the event "drop unknown" tolerance. (No `invalid_request` / + `unsupported_contract_version` codes for now — we are not validating requests or enforcing a + version at the boundary in this phase.) +- **`retryable`** lets the caller distinguish a transient `timeout` / `rate_limited` / `mcp_error` + from a permanent `unsupported_harness` / `auth_error` / `configuration_error`. +- **Distinct cancelled outcome — but only where it is actually deliverable.** A user/client abort + is **not** a failure. The subtlety (a real review catch): a *client disconnect* mid-stream + cannot reliably receive a terminal record, because the disconnect is exactly what tears the + transport down — `server.ts` aborts the run *on* response `close`, and the Python streaming + transports treat a stream with no terminal `result` as an error (`ts_runner.py`). So: + - **Cooperative cancellation while the transport is still open** (e.g. an in-band stop signal, + or a future `/cancel`-style affordance): emit the terminal `{ ok:false, error:{code: + "cancelled"} }` record — the §8b "exactly one terminal result" invariant holds and the result + stays authoritative. Set `retryable:false` (or omit it) — a cancel is intentional, not a + transient fault. + - **Transport teardown (the disconnect case we have today)**: the terminal record cannot be + delivered; the Python side must map "generator cancelled / connection closed by us" to a + distinct **`CancelledError`-style outcome**, NOT the generic "stream ended without a terminal + result" `RuntimeError`. This is a Python-side parsing/exception change, not a wire record. + - Optionally also emit a `done` event with `stopReason:"cancelled"` for streams (useful as a + live signal), but the terminal result remains authoritative when the connection is alive. +- **Migration:** `result_from_wire` must accept **both** the old free-string `error` and the new + structured object (parse a string into `{code:"internal", message:str, retryable:false}`). This + read-compat is cheap and avoids a hard flag-day, but because this is a POC we do **not** treat + the new error shape as a versioned cut — the wire just changes to the structured form. + +This is a wire-shape change (the new structured error), made directly. No version bump is tied to +it (POC). + +## 8. Incremental, test-at-each-step plan (POC-framed) + +No big-bang, but no versioning machinery either — this is a pre-production POC, so the wire just +changes when it needs to. Each step is a small change plus the test that proves it. The +heaviest items (runner-side validation, generating `protocol.ts`, version negotiation) are +**deferred** until the contract stabilizes; they are listed at the end as follow-ups, not steps. + +The sequence respects the shared-surface rule (`agent-coordination.md`): any change to +`protocol.ts` / `wire.py` / golden / the two contract tests is coordinated, single-PR, both +sides + golden together. + +1. **Add the dedicated Pydantic wire models in the SDK (no wire change).** + Add `WireRunRequest` / `WireRunResult` (and the discriminated `WireAgentEvent`) wire models in + the SDK, with camelCase aliases, reproducing exactly what `request_to_wire` / `result_from_wire` + emit/parse today (against the *current* renamed shape — `pi_core` / `pi_agenta`, no `backend`). + *Test:* a unit test asserts `WireRunRequest(...).model_dump(by_alias=True, exclude-none-ish) + == request_to_wire(...)` for the pi_core, claude, and pi_agenta payloads (round-trip parity with + the goldens). Green before anything else. + +2. **Export the JSON interface into the SDK + a freshness test.** + Export `model_json_schema()` for the wire models and ship it in the SDK alongside the existing + `CATALOG_TYPES` JSON interfaces (Section 4.1). Commit the artifact. + *Test:* a test regenerates the schema in-memory and asserts it equals the committed export + (drift -> fail), the same discipline the goldens already use. + +3. **Assert the existing goldens validate against the exported schema (Python side, tests only).** + *Test:* load each golden, validate against the exported schema (`jsonschema`); all must pass. + This proves the schema faithfully describes today's wire. **No production code path changes, and + nothing on the runner side** — validation here is a test, not a runtime guard (Section 5). + +4. **Make the wire models the single producer.** + Reimplement `request_to_wire` / `result_from_wire` in terms of the wire models, keeping the + omit-when-empty serializer behavior. The goldens stay byte-identical (this is a refactor, the + models already match the wire from step 1). + *Test:* the existing golden wire-contract test stays green unchanged; add a parity test that the + reimplemented serializers equal the old output. + +5. **Replace the duplicated key lists with a schema-derived guard (Python side).** + Swap the hand-kept Python `KNOWN_REQUEST_KEYS` for a set derived from the exported schema's + `properties`, so the Python guard cannot silently fall behind. The TS `KNOWN_REQUEST_KEYS` guard + in `wire-contract.test.ts` stays hand-written for now (we are not generating `protocol.ts` or + touching the runner this phase). + *Test:* `set(schema.properties) == set(python KNOWN_REQUEST_KEYS)`. + +6. **Structured error model + cancelled outcome (A10).** + Result `error` becomes `{code, message, retryable}`; `result_from_wire` also reads the old free + string for read-compat (string -> `{code:"internal", message:str}`). Cancellation: cooperative + cancel emits the terminal `{ok:false, error:{code:"cancelled"}}`; transport-teardown cancel maps + to a distinct Python `CancelledError` (per §7 A10). This is a direct wire change — **no version + bump** (POC). + *Test:* `test_wire_contract.py` parses an old-string-error golden and a new-structured golden; a + transport test asserts a disconnect yields the Python `CancelledError`. New goldens: + `run_result.cancelled.json`, `run_result.error_structured.json`. + +7. **Promote the capability probe: `GET /capabilities` (additive, the author endorsed it).** + Add the static per-harness `HarnessCapabilities` route to the runner. It returns **base** + capabilities (what the harness supports at all); mode-dependent flags (`streamingDeltas`, derived + at run time in `engines/sandbox_agent.ts`) stay authoritative only in a run result. The service + can pre-render UI / pre-check a request against the base set. + *Test:* `server.test.ts` asserts `GET /capabilities` returns the base capability map per harness + without running a turn. + +### Deferred follow-ups (only once the contract stabilizes) + +These are explicitly **not** in this phase, per the author's review: + +- **Runner-side request validation.** Loading the schema in `server.ts` / `cli.ts` and rejecting a + malformed `/run` (with ajv or similar). The contract is too brittle to gate on yet. +- **Generating `protocol.ts` from the schema.** Pays off only once the runner consumes the schema; + until then `protocol.ts` stays hand-written and the Python goldens are the guard. +- **A version field + negotiation.** Owned by A1; if/when it lands it is a simple string version + + if/else (Section 7 A1), not a `{major, minor}` or upcaster/downcaster scheme. +- **Fern generating the interface across languages.** Reachable later via Pydantic → OpenAPI → Fern + once the contract is stable enough to publish into the clients (Section 4.1). + +After this phase: one Pydantic wire-model source -> the JSON interface shipped in the SDK -> +structured errors + a correctly-modeled cancelled outcome -> a real capability probe. No runner +validation, no version machinery, no generated TS types — those are deferred until the contract +settles. + +## 9. Risks and mitigations + +- **Drift between `protocol.ts` types and the schema.** While `protocol.ts` stays hand-written and + the runner does not consume the schema, this drift is tolerated as a POC trade-off. The Python + goldens + the schema-derived Python key guard (step 5) catch Python-side drift; aligning the TS + types is a manual discipline for now. Generating `protocol.ts` is the deferred fix. +- **The committed schema export going stale.** Mitigated by step 2's freshness test (regenerate == + committed), the same discipline the goldens already use. +- **Sequencing against A1.** A1 owns the version convention (a simple string + if/else); this + project only carries whatever field A1 specifies. The error model (step 6) and capability probe + (step 7) do not depend on A1. +- **No boundary guard means typos still pass silently.** Accepted for now — the contract is too + brittle to gate on. The runner keeps today's behavior. Revisit with the deferred runner-side + validation once the contract stabilizes. + +## 10. Open questions for review + +1. **Wire models placement.** A new `agents/wire_models.py` next to `dtos.py` (proposed) vs a + dedicated contract package. The exported JSON interface ships in the SDK alongside + `CATALOG_TYPES`. +2. **Where exactly the exported interface is surfaced in the SDK.** As an entry in (or sibling of) + `CATALOG_TYPES` in `sdks/python/agenta/sdk/utils/types.py`, vs a standalone export. Either keeps + it SDK-resident; the `CATALOG_TYPES` path also makes it `/inspect`-discoverable. +3. **Cancelled modeling.** Cooperative cancel -> terminal `error.code:"cancelled"`; transport + teardown -> distinct Python `CancelledError` (proposed). Optionally also a `done` + `stopReason:"cancelled"`. `retryable` for cancel: `false`/omit. +4. **Capability probe shape + base-vs-effective.** Return all harnesses (proposed) vs `?harness=`; + and the probe returns **base** capabilities (proposed), with mode-dependent flags + (`streamingDeltas`) authoritative only in a run result. +5. **The deferred follow-ups (Section 8).** Confirm runner-side validation, generated `protocol.ts`, + the version field, and Fern publication are all out of scope for this POC phase. + +## 11. Review + +This plan was reviewed by Codex (gpt-5.5, xhigh, read-only) on 2026-06-24, then revised on +2026-06-24 per the author's PR review on #4830. The author's direction simplified it toward the POC +reality: + +- **No back-compat burden** — this is still an internal POC, so any wire shape may change freely + (the "must preserve the model/connection split" framing was dropped). +- **Pydantic as the source for now**, with the immediate goal that the exported JSON interface + lives **in the SDK** (the `CATALOG_TYPES` path), plus a Fern investigation (Section 4.1): Fern + here is driven by Pydantic → OpenAPI → Fern → SDKs, so it can see this interface later via the + OpenAPI surface once the contract stabilizes — no hard blocker, only a timing call. +- **No sidecar/runner validation yet** (no ajv, no new runner dependency) — the contract is still + brittle (Section 5); `protocol.ts` stays hand-written for now. +- **No versioning machinery** — the pi/agenta rename (already landed) is not versioned, and any + version field defers to A1's simple string + if/else convention. +- **Keep `/capabilities`** — the author endorsed the probe. + +Codex's earlier structural catches that survive the simplification: source from dedicated **wire** +models (not the snake_case semantic DTOs); cancellation via a terminal record only works for +**cooperative** cancel (a disconnect maps to a Python `CancelledError`); the error taxonomy is +grounded in what `engines/sandbox_agent/errors.ts` already classifies; capabilities are +base-vs-effective. The corrections that were about versioning/validation (two-breaking-changes-one- +cut, the step-5 error-shape ordering, the both-transport version probe) are **moot** now that +versioning and runner validation are deferred. diff --git a/docs/design/agent-workflows/projects/wire-contract-schema/status.md b/docs/design/agent-workflows/projects/wire-contract-schema/status.md new file mode 100644 index 0000000000..12420c086a --- /dev/null +++ b/docs/design/agent-workflows/projects/wire-contract-schema/status.md @@ -0,0 +1,142 @@ +# Status: wire-contract-schema + +| | | +| --- | --- | +| **Phase** | **Implemented** (2026-06-24). Pydantic wire models are the schema source of truth, exported into the SDK via `CATALOG_TYPES`; the `/inspect` canonical response + typed outputs landed. No runner/validation work (deferred). | +| **Owner** | wire-contract-schema (A2 in the A1/A2/A3/A10 cohort) | +| **Lane** | `feat/agent-wire-contract-schema-plan` (PR #4830), re-stacked on `feat/agent-contract-versioning-docs` (#4829). One PR = plan doc + impl. | +| **Created** | 2026-06-24 | +| **Revised** | 2026-06-24 (author PR review) | +| **Implemented** | 2026-06-24 | + +## What shipped (the implementation) + +The plan's source-of-truth slice plus the folded `/inspect` follow-ups (architecture-followups +issue 1 + typed outputs). Resolved every open question with the least-code option: + +- **Wire models as the single schema source of truth** — + `sdks/python/agenta/sdk/agents/wire_models.py`: dedicated camelCase Pydantic models + (`WireRunRequest`, `WireRunResult`, sub-objects, and an OPEN `WireAgentEvent` whose `type` is + optional so a typeless event is tolerated, mirroring the parser's drop behavior). NOT the + snake_case semantic DTOs. `run_contract_schemas()` exports their dereferenced, camelCase JSON + Schema. +- **The JSON interface ships in the SDK** via `CATALOG_TYPES` (`run_request` / `run_result`), the + same path `agent_config` takes — so it is `/inspect`-discoverable through + `GET /workflows/catalog/types/{type}`. No new endpoint. +- **Tests, no runtime validation** (`test_wire_models.py`): the committed catalog matches a fresh + export (freshness guard), all four goldens validate against the exported schema and parse into + the models, `request_to_wire` output validates, and the schema's property set equals + `KNOWN_REQUEST_KEYS` (the schema-derived key guard). Nothing gates a live `/run`. +- **`wire.py` stays the dict producer** — least-code: the omit-when-empty behavior lives there and + is pinned by the goldens (a thing `model_json_schema()` cannot express). The models are the + *schema* authority and a docstring in `wire.py` points to them. No serializer rewrite. +- **Issue 1 — canonical `/inspect` response**: `WorkflowInspectResponse` in + `sdks/python/agenta/sdk/models/workflows.py`; `handle_inspect_success` normalizes the + internally-built `WorkflowInvokeRequest` into it (`_to_inspect_response`), lifting the resolved + `WorkflowRevisionData` to a flat top-level `revision`, so schemas live at + `response.revision.schemas` (was the latent-broken `data.revision.data.schemas` nesting). The + three `/inspect` routes' `response_model` is now `WorkflowInspectResponse`. FE: the + `InspectWorkflowResponse` type and the `store.ts` read now resolve against the real body + (`revision.schemas`); the deprecated `interface?.schemas` fallback is kept on the type as a + migration bridge (two sibling readers still use it). +- **Issue 4 — typed `/inspect` outputs**: `services/oss/src/agent/schemas.py` `AGENT_OUTPUTS_SCHEMA` + is keyed per output surface (`invoke` -> `message`, `messages` -> `messages`). Reuses existing + catalog markers, so the catalog-refs guard is unchanged. POC: no flat back-compat output field. + +### Deferred (noted in the PR body; NOT built) + +- The `/run` `version` field + dispatch (A1 already deferred it). +- Runner-side request validation (no ajv, no runner dependency). +- The `GET /capabilities` probe. +- Generating `protocol.ts` from the schema; the structured-error / cancelled outcome; Fern + publication across languages. +- `services/agent/CLAUDE.md`'s mirroring rule should mention the Pydantic wire models are now the + schema source — left for the runner owner (`services/agent/*` is their surface, not touched here). + +## What exists + +- `README.md` — the plan, revised to the author's POC framing: current-state assessment, the three + source-of-truth options with the Option B recommendation (Pydantic-as-source **for now**, JSON + interface **in the SDK**, a Fern investigation in §4.1), the `/run` split decision (keep unified, + promote `/capabilities`), the A10 structured-error + cancelled change, A1 coordination on a + **simple string version**, a 7-step POC-framed plan with the heavy items deferred, and a Review + section (§11) recording both the Codex pass and the author's revision. + +## Author PR review (2026-06-24) — what changed + +Four inline comments on #4830, all addressed: + +1. **No back-compat burden** (README ~§3.1). Dropped all "the schema must preserve the + model/connection split / omit-when-empty bytes" framing. This is an internal POC; any wire shape + may change freely. Shape notes are now described as "current serializer behavior, not a + constraint." (README §Status, §1, §2, §3.1, §11.) +2. **Pydantic-as-source now + interface in the SDK + Fern** (README ~§4 recommendation). Revised the + recommendation: Pydantic is the source for now; the immediate goal is that the exported JSON + Schema interface lives **in the SDK** (the `CATALOG_TYPES` path); added §4.1 investigating Fern. + Explicitly **dropped using the schema in the sidecar/runner** for now (contract still brittle). +3. **No runner ingress validation** (README ~§5). Rewrote §5 as "validation — deferred": no ajv, no + runner dependency, no `server.ts`/`cli.ts` request validation. The schema is used in Python tests + only (goldens-must-validate). A boundary guard is a deferred follow-up. +4. **Keep `/capabilities`** (README ~§6). The probe stays; the author endorsed it. Noted his + endorsement inline. + +## Fern findings (the §4.1 investigation) + +- Fern in this repo generates the multi-language API **clients** (Python + TS) under `clients/` and + `web/packages/agenta-api-client/`. The pipeline (`clients/scripts/generate.sh`) is + **Pydantic → `/api/openapi.json` → Fern (`fernapi/fern-python-sdk`, `fernapi/fern-typescript-sdk`) + → SDKs**. There is no checked-in `.fern/` IDL; Fern's only input is the generated OpenAPI doc. +- The SDK **already** exposes Pydantic-derived JSON interfaces: `CATALOG_TYPES` in + `sdks/python/agenta/sdk/utils/types.py` (~line 1265) is a dict of `model_json_schema()` outputs, + surfaced via `/inspect` `x-ag-type-ref` markers (`services/oss/src/agent/schemas.py`). The wire + contract should ship the same way. +- **Can Fern see this interface? Yes — but only via OpenAPI, with a caveat.** `/run` is the + service↔runner spine, not a public FastAPI endpoint, so it is not in `openapi.json` today and Fern + cannot see it as-is. Making Fern see it = reference the wire Pydantic models from a FastAPI surface + (FastAPI then emits them into `components/schemas`, the same path `AgentConfigSchema` takes). **No + hard blocker** — the only reason not to now is that the contract is brittle and publishing a moving + target into every generated client is premature. So: SDK-resident now, Fern later. + +## Decisions made in the (revised) plan + +1. **Schema source = dedicated Pydantic *wire* models (Option B), NOT the semantic DTOs**, authored + against the **already-landed** renamed shape (`pi_core` / `pi_agenta`, no `backend`). Export + `model_json_schema()` and ship it in the SDK alongside `CATALOG_TYPES`. +2. **`protocol.ts` stays hand-written for now.** No generated TS types this phase (only pays off once + the runner consumes the schema). Python goldens are the guard. +3. **`/run` stays unified for the turn.** Promote a `GET /capabilities` probe (static **base** + per-harness capabilities). Rejected: a `/cancel` endpoint. +4. **Error model `{ code, message, retryable }`** with a grounded taxonomy and a cancelled outcome + (terminal record for cooperative cancel; Python `CancelledError` for transport-teardown cancel). + Made as a **direct wire change, no version bump** (POC). +5. **No versioning machinery.** The pi/agenta rename is not versioned. Any version field defers to + A1's **simple string version + if/else** (the `x-ag-messages-version: "v1"` / LLM-as-judge + pattern) — no `{major, minor}`, no `contractVersion` name, no upcaster/downcaster. +6. **No runner/sidecar validation yet** (deferred follow-up). + +## Deferred (Section 8 follow-ups, out of scope for this POC phase) + +- Runner-side request validation (ajv / boundary guard). +- Generating `protocol.ts` from the schema. +- A version field + negotiation (A1-owned, simple string). +- Fern generating the interface across languages (via Pydantic → OpenAPI once stable). + +## Coordination + +- **A1 (`contract-versioning`)** — sibling at `../contract-versioning/`, being simplified by another + agent to a plain string version + if/else per the author. This project reuses whatever string + convention A1 lands on; does NOT invent its own. (Did not touch A1's README — another agent owns it.) +- **A3 (backend removal + harness rename)** — **already landed in the working tree** (`version.ts` + has `pi_core`/`pi_agenta`, golden renamed `run_request.pi_core.json`, `engines/pi.ts` deleted). The + wire models describe that shape from the start; the rename is not versioned (POC). +- **A10 (error model)** — folded into the plan (step 6) as a direct wire change. +- **`sidecar-trust-and-sandbox-enforcement`** flagged a stale `protocol.ts:149-150` comment; noted. +- **DOCS-ONLY.** No edit to `protocol.ts` / `wire.py` / golden / contract tests / `interfaces/*`. + Composio, the tool gateway, connections, and MCP are described as existing and unchanged. + +## Next actions (after review) + +- Get sign-off on README §10 open questions (wire-model placement, where the SDK surfaces the export, + cancelled modeling, capability probe shape, and the deferred follow-up list). +- Confirm with A1 the exact simple string version convention to carry (if any) on the payload. +- Then implement step 1 (dedicated wire models with round-trip parity tests against the goldens). diff --git a/sdks/python/agenta/sdk/agents/utils/wire.py b/sdks/python/agenta/sdk/agents/utils/wire.py index ae0e369c70..a4fd01f6a5 100644 --- a/sdks/python/agenta/sdk/agents/utils/wire.py +++ b/sdks/python/agenta/sdk/agents/utils/wire.py @@ -5,6 +5,14 @@ under ``sdks/python/oss/tests/pytest/unit/agents/golden/`` (see ``test_wire_contract.py``). The runner drives one engine (the sandbox-agent ACP path); the ``harness`` field selects the agent, so there is no engine selector on the wire. + +The SCHEMA source of truth for this contract is the dedicated Pydantic wire models in +``agenta.sdk.agents.wire_models`` (``WireRunRequest`` / ``WireRunResult``). Their exported JSON +Schema ships in the SDK through ``CATALOG_TYPES`` and is asserted to describe exactly what the +functions below emit/parse (``test_wire_models.py``). The serializer here stays a hand-built +dict on purpose: the omit-when-empty behavior lives in this file (and is pinned by the goldens), +which ``model_json_schema()`` cannot express. Add or rename a wire field in BOTH places (here and +the wire models) plus ``protocol.ts`` and the goldens — the tests catch a one-sided change. """ from __future__ import annotations diff --git a/sdks/python/agenta/sdk/agents/wire_models.py b/sdks/python/agenta/sdk/agents/wire_models.py new file mode 100644 index 0000000000..cce8381926 --- /dev/null +++ b/sdks/python/agenta/sdk/agents/wire_models.py @@ -0,0 +1,374 @@ +"""The ``/run`` wire contract as Pydantic models — the single schema source of truth. + +These models describe the EXACT camelCase JSON the Python producer emits and parses in +``utils/wire.py`` (``request_to_wire`` / ``result_from_wire``) and the TS runner mirrors in +``services/agent/src/protocol.ts``. They are deliberately a SEPARATE set from the semantic +DTOs in ``dtos.py``: the DTOs are snake_case and intentionally loose (``AgentEvent`` is a free +``type: str`` + ``data`` bag), while the real wire is camelCase with a discriminated event +union. Exporting ``model_json_schema()`` off the DTOs would produce the wrong schema, so the +contract lives here. + +What these models are for in this phase (a pre-production POC): + +- They are the schema authority: ``run_contract_schemas()`` exports their JSON Schema, which + ships in the SDK through ``CATALOG_TYPES`` (the same mechanism ``AgentConfigSchema`` uses to + reach the SDK / clients / ``/inspect``). A test asserts the committed catalog entry matches a + fresh export, so the schema cannot drift from these models. +- They validate the golden fixtures and ``request_to_wire`` output in tests, proving the schema + faithfully describes today's wire. + +What they are NOT (deferred, per the project plan): + +- They are NOT a runtime guard. ``request_to_wire`` still builds a plain dict and the runner + still parses the body as-is; nothing validates against these models on a live ``/run``. +- They do NOT carry a contract ``version`` field, structured errors, or a ``cancelled`` outcome + yet — those are deferred follow-ups. The result error stays the current free string. + +Conventions: every field is camelCase via an alias, with ``populate_by_name=True`` so the +models also accept the Python field name. Optional fields default to ``None`` / empty, matching +the implicitly-all-optional wire. ``extra="allow"`` keeps the models forward-compatible (an +unknown field is not the schema's job to reject in this POC phase). +""" + +from __future__ import annotations + +from typing import Any, ClassVar, Dict, List, Literal, Optional, Union + +from pydantic import BaseModel, ConfigDict, Field + + +class _WireModel(BaseModel): + """Base for every wire model: camelCase aliases, accept-by-name, allow extra. + + ``populate_by_name=True`` lets a producer construct with the Python field names while the + schema and ``model_dump(by_alias=True)`` speak camelCase. ``extra="allow"`` keeps the + contract open/forward-compatible (matching the runner's tolerant parsing); this POC does not + reject unknown fields. + + ``__ag_type__`` is the catalog key a top-level model carries into ``CATALOG_TYPES`` (the + same role :class:`~agenta.sdk.utils.types.AgSchemaMixin` plays for the other catalog types). + It is NOT mixed in from ``utils/types`` on purpose: ``utils/types`` imports the agents + package, so importing it here would create a load cycle. ``ag_type()`` reads the marker. + """ + + model_config = ConfigDict(populate_by_name=True, extra="allow") + + __ag_type__: ClassVar[Optional[str]] = None + + @classmethod + def ag_type(cls) -> str: + if cls.__ag_type__ is None: + raise ValueError(f"{cls.__name__} does not define __ag_type__") + return cls.__ag_type__ + + +# --------------------------------------------------------------------------- +# Shared sub-objects +# --------------------------------------------------------------------------- + + +class WireEndpoint(_WireModel): + """Non-secret connection config (mirrors ``Endpoint.to_wire``).""" + + base_url: Optional[str] = Field(default=None, alias="baseUrl") + api_version: Optional[str] = Field(default=None, alias="apiVersion") + region: Optional[str] = None + headers: Optional[Dict[str, str]] = None + + +class WireConnection(_WireModel): + """The author's credential-connection intent (``{mode, slug?}``).""" + + mode: Literal["agenta", "self_managed"] = "agenta" + slug: Optional[str] = None + + +class WireContentBlock(_WireModel): + """One content block of a message (mirrors ``ContentBlock.to_wire``).""" + + type: str + text: Optional[str] = None + data: Optional[str] = None + mime_type: Optional[str] = Field(default=None, alias="mimeType") + uri: Optional[str] = None + tool_call_id: Optional[str] = Field(default=None, alias="toolCallId") + tool_name: Optional[str] = Field(default=None, alias="toolName") + input: Optional[Any] = None + output: Optional[Any] = None + is_error: Optional[bool] = Field(default=None, alias="isError") + + +class WireChatMessage(_WireModel): + """A chat message on the wire: ``{role, content}`` (string or content blocks).""" + + role: str + content: Union[str, List[WireContentBlock]] = "" + + +class WireTraceContext(_WireModel): + """Agenta trace context threaded into a run (mirrors ``TraceContext.to_wire``).""" + + traceparent: Optional[str] = None + baggage: Optional[str] = None + endpoint: Optional[str] = None + authorization: Optional[str] = None + capture_content: bool = Field(default=True, alias="captureContent") + + +class WireToolCallback(_WireModel): + """Where callback (gateway) tools route their calls back to.""" + + endpoint: Optional[str] = None + authorization: Optional[str] = None + + +class WireRenderHint(_WireModel): + """How a tool's result should be rendered by a client.""" + + kind: Optional[str] = None + component: Optional[str] = None + + +class WireResolvedToolSpec(_WireModel): + """A resolved tool the runner delivers to the harness (the three-axis tool surface). + + ``kind`` is the executor axis (``callback`` / ``code`` / ``client`` / ``builtin``); + ``needsApproval`` / ``render`` are the orthogonal axes; ``callRef`` / ``runtime`` / ``code`` + / ``env`` are executor-specific. Extra fields are allowed so an executor variant the schema + has not enumerated still validates. + """ + + name: str + description: Optional[str] = None + input_schema: Optional[Dict[str, Any]] = Field(default=None, alias="inputSchema") + kind: Optional[str] = None + call_ref: Optional[str] = Field(default=None, alias="callRef") + runtime: Optional[str] = None + code: Optional[str] = None + env: Optional[Dict[str, str]] = None + needs_approval: Optional[bool] = Field(default=None, alias="needsApproval") + render: Optional[WireRenderHint] = None + read_only: Optional[bool] = Field(default=None, alias="readOnly") + permission: Optional[str] = None + + +class WireMcpServer(_WireModel): + """A user-declared MCP server (stdio or http), mirrors ``mcp_servers_to_wire``.""" + + name: str + transport: Optional[str] = None + command: Optional[str] = None + args: Optional[List[str]] = None + env: Optional[Dict[str, str]] = None + url: Optional[str] = None + headers: Optional[Dict[str, str]] = None + tools: Optional[List[str]] = None + permission: Optional[str] = None + + +class WireSkillFile(_WireModel): + """One bundled file in a resolved inline skill package.""" + + path: str + content: str + executable: Optional[bool] = None + + +class WireSkill(_WireModel): + """A resolved inline skill package (mirrors ``skill_to_wire``).""" + + name: str + description: Optional[str] = None + body: Optional[str] = None + files: Optional[List[WireSkillFile]] = None + disable_model_invocation: Optional[bool] = Field( + default=None, alias="disableModelInvocation" + ) + allow_executable_files: Optional[bool] = Field( + default=None, alias="allowExecutableFiles" + ) + + +class WireNetworkEgress(_WireModel): + """The sandbox outbound-network policy (mirrors ``NetworkEgress``).""" + + mode: Literal["on", "off", "allowlist"] = "on" + allowlist: List[str] = Field(default_factory=list) + + +class WireSandboxPermission(_WireModel): + """The declared sandbox security boundary (mirrors ``SandboxPermission.to_wire``).""" + + network: WireNetworkEgress = Field(default_factory=WireNetworkEgress) + filesystem: Optional[Literal["on", "readonly", "off"]] = None + enforcement: Literal["strict", "best_effort"] = "strict" + + +class WireHarnessFile(_WireModel): + """One file the active harness's config renders into the session cwd before a run.""" + + path: str + content: str + + +class WireHarnessCapabilities(_WireModel): + """What a harness can do, probed by the runner (the 11 boolean flags).""" + + text_messages: bool = Field(default=True, alias="textMessages") + images: bool = False + file_attachments: bool = Field(default=False, alias="fileAttachments") + mcp_tools: bool = Field(default=False, alias="mcpTools") + tool_calls: bool = Field(default=False, alias="toolCalls") + reasoning: bool = False + plan_mode: bool = Field(default=False, alias="planMode") + permissions: bool = False + usage: bool = False + streaming_deltas: bool = Field(default=False, alias="streamingDeltas") + session_lifecycle: bool = Field(default=False, alias="sessionLifecycle") + + +class WireAgentUsage(_WireModel): + """Token / cost usage rolled onto a workflow span.""" + + input: Optional[int] = None + output: Optional[int] = None + total: Optional[int] = None + cost: Optional[float] = None + + +# --------------------------------------------------------------------------- +# The event union (open / forward-compatible) +# --------------------------------------------------------------------------- + + +class WireAgentEvent(_WireModel): + """One structured event from a run, keyed by ``type``. + + The Python parser (``AgentEvent.from_wire``) keeps the whole event verbatim and drops a + typeless event, so the wire event is intentionally OPEN: ``type`` is the discriminator and + ``extra="allow"`` carries the rest. ``type`` is OPTIONAL on the model on purpose — a + typeless event is dropped, not rejected (a golden pins exactly that), and the schema must + describe that tolerance rather than reject it. A closed discriminated union would also reject + the forward-compatible event types the runner may add, which contradicts the "drop unknown" + guarantee. The known ``type`` values are documented for readers, not enforced: ``message``, + ``thought``, the ``message_*`` / ``reasoning_*`` lifecycle trios, ``tool_call``, + ``tool_result``, ``interaction_request``, ``data``, ``file``, ``usage``, ``error``, ``done``. + """ + + type: Optional[str] = None + + +# --------------------------------------------------------------------------- +# The request +# --------------------------------------------------------------------------- + + +class WireRunRequest(_WireModel): + """The ``/run`` request payload — the exact field set ``request_to_wire`` may emit. + + Every field is optional on the wire (the contract is implicitly all-optional), so the schema + expresses "optional" while the producer's omit-when-empty behavior stays in ``wire.py`` and + is pinned by the golden fixtures. The harness selects the agent (``pi_core`` / ``pi_agenta`` + / ``claude``); there is no engine selector on the wire (A3 removed the legacy backend). + """ + + __ag_type__ = "run_request" + + harness: Optional[str] = None + sandbox: Optional[str] = None + session_id: Optional[str] = Field(default=None, alias="sessionId") + agents_md: Optional[str] = Field(default=None, alias="agentsMd") + # Model + connection. ``model`` stays a plain string; the structured provider/connection + # fields ride alongside only when a resolved connection / model ref is present. + model: Optional[str] = None + provider: Optional[str] = None + connection: Optional[WireConnection] = None + deployment: Optional[str] = None + endpoint: Optional[WireEndpoint] = None + credential_mode: Optional[str] = Field(default=None, alias="credentialMode") + # Turn. + messages: Optional[List[WireChatMessage]] = None + # Secrets injected as harness env (provider keys); never written to the agent filesystem. + secrets: Optional[Dict[str, str]] = None + trace: Optional[WireTraceContext] = None + # Tools + skills. + tools: Optional[List[str]] = None + custom_tools: Optional[List[WireResolvedToolSpec]] = Field( + default=None, alias="customTools" + ) + tool_callback: Optional[WireToolCallback] = Field( + default=None, alias="toolCallback" + ) + mcp_servers: Optional[List[WireMcpServer]] = Field(default=None, alias="mcpServers") + skills: Optional[List[WireSkill]] = None + # Policy + prompt overrides + files. + permission_policy: Optional[str] = Field(default=None, alias="permissionPolicy") + system_prompt: Optional[str] = Field(default=None, alias="systemPrompt") + append_system_prompt: Optional[str] = Field( + default=None, alias="appendSystemPrompt" + ) + sandbox_permission: Optional[WireSandboxPermission] = Field( + default=None, alias="sandboxPermission" + ) + harness_files: Optional[List[WireHarnessFile]] = Field( + default=None, alias="harnessFiles" + ) + + +# --------------------------------------------------------------------------- +# The result +# --------------------------------------------------------------------------- + + +class WireRunResult(_WireModel): + """The ``/run`` result payload — what ``result_from_wire`` parses. + + ``ok`` is the outcome flag; on failure ``error`` is the current free string (a structured + error model is a deferred follow-up, not this phase). On success the run carries ``output``, + ``messages``, ``events``, ``usage``, ``stopReason``, ``capabilities``, plus the resolved + ``sessionId`` / ``model`` / ``traceId``. + """ + + __ag_type__ = "run_result" + + ok: bool + output: Optional[str] = None + messages: Optional[List[WireChatMessage]] = None + events: Optional[List[WireAgentEvent]] = None + usage: Optional[WireAgentUsage] = None + stop_reason: Optional[str] = Field(default=None, alias="stopReason") + capabilities: Optional[WireHarnessCapabilities] = None + session_id: Optional[str] = Field(default=None, alias="sessionId") + model: Optional[str] = None + trace_id: Optional[str] = Field(default=None, alias="traceId") + error: Optional[str] = None + + +# --------------------------------------------------------------------------- +# The exported JSON interface +# --------------------------------------------------------------------------- + +# The top-level wire models whose JSON Schema ships in the SDK. Each is keyed by its +# ``x-ag-type`` so ``CATALOG_TYPES`` can carry it the same way it carries ``agent_config``. +WIRE_CONTRACT_MODELS = (WireRunRequest, WireRunResult) + + +def run_contract_schemas() -> Dict[str, Dict[str, Any]]: + """The exported JSON Schema of the ``/run`` wire models, keyed by ``x-ag-type``. + + Uses ``model_json_schema(by_alias=True)`` so the emitted property names are the camelCase + wire keys, and dereferences ``$defs`` (the same treatment ``CATALOG_TYPES`` gives every other + entry, via ``_dereference_schema``) so the catalog entries are self-contained. This is the + single export point: ``CATALOG_TYPES`` adds these entries, and a freshness test asserts the + committed catalog matches a fresh call here so the schema cannot silently drift from the + models. + """ + # Local import to avoid a module-load cycle: ``utils/types`` imports the agents package. + from ..utils.types import _dereference_schema + + schemas: Dict[str, Dict[str, Any]] = {} + for model in WIRE_CONTRACT_MODELS: + schema = _dereference_schema(model.model_json_schema(by_alias=True)) + schema["x-ag-type"] = model.ag_type() + schemas[model.ag_type()] = schema + return schemas diff --git a/sdks/python/agenta/sdk/decorators/routing.py b/sdks/python/agenta/sdk/decorators/routing.py index e4e8543f07..b312b8cb02 100644 --- a/sdks/python/agenta/sdk/decorators/routing.py +++ b/sdks/python/agenta/sdk/decorators/routing.py @@ -14,6 +14,7 @@ from agenta.sdk.models.workflows import ( WorkflowInvokeRequest, WorkflowInspectRequest, + WorkflowInspectResponse, WorkflowServiceStatus, WorkflowBatchResponse, WorkflowStreamingResponse, @@ -350,11 +351,39 @@ async def handle_invoke_failure(exception: Exception) -> Response: return _make_json_response(error) +def _to_inspect_response( + request: WorkflowInvokeRequest, +) -> WorkflowInspectResponse: + """Normalize the internally-built ``WorkflowInvokeRequest`` into the canonical response. + + ``workflow.inspect()`` builds its result as a ``WorkflowInvokeRequest`` (a REQUEST model), so + the resolved interface lands nested at ``data.revision.data``. The public ``/inspect`` + contract is :class:`WorkflowInspectResponse` instead, which lifts that + :class:`WorkflowRevisionData` up to a flat top-level ``revision`` — so a client reads schemas + at ``response.revision.schemas`` rather than guessing the request envelope. + """ + nested = (request.data.revision or {}) if request.data else {} + revision_data = nested.get("data") if isinstance(nested, dict) else None + # Carry the resolved config so the public boundary doesn't drop it: the FE reads + # ``configuration.parameters`` as a fallback when ``revision.parameters`` is absent. + parameters = ( + revision_data.get("parameters") if isinstance(revision_data, dict) else None + ) + configuration = {"parameters": parameters} if parameters is not None else None + return WorkflowInspectResponse( + version=request.version, + revision=revision_data, + configuration=configuration, + meta=request.meta, + ) + + async def handle_inspect_success( request: Optional[WorkflowInvokeRequest], ): if request: - return JSONResponse(request.model_dump(mode="json", exclude_none=True)) + response = _to_inspect_response(request) + return JSONResponse(response.model_dump(mode="json", exclude_none=True)) return JSONResponse({"details": {"message": "Workflow not found"}}, status_code=404) @@ -544,7 +573,7 @@ def _add_agent_routes(target: Any, prefix: str) -> None: self.path + "/inspect", inspect_endpoint, methods=["POST"], - response_model=WorkflowInvokeRequest, + response_model=WorkflowInspectResponse, ) if agent_enabled: _add_agent_routes(self.router_fallback, self.path) @@ -568,7 +597,7 @@ def _add_agent_routes(target: Any, prefix: str) -> None: "/inspect", inspect_endpoint, methods=["POST"], - response_model=WorkflowInvokeRequest, + response_model=WorkflowInspectResponse, ) if agent_enabled: _add_agent_routes(self.mount_root, "") @@ -587,7 +616,7 @@ def _add_agent_routes(target: Any, prefix: str) -> None: "/inspect", inspect_endpoint, methods=["POST"], - response_model=WorkflowInvokeRequest, + response_model=WorkflowInspectResponse, ) if agent_enabled: _add_agent_routes(sub_app, "") diff --git a/sdks/python/agenta/sdk/models/workflows.py b/sdks/python/agenta/sdk/models/workflows.py index 0779695a8e..18c212dd53 100644 --- a/sdks/python/agenta/sdk/models/workflows.py +++ b/sdks/python/agenta/sdk/models/workflows.py @@ -307,6 +307,37 @@ def _coerce_nested_models(cls, values: Dict[str, Any]) -> Dict[str, Any]: WorkflowServiceInspectRequest = WorkflowInspectRequest +class WorkflowInspectResponse(Metadata): + """The canonical ``/inspect`` response: the resolved interface, flat and self-describing. + + ``/inspect`` is a public edge — it tells the browser which form to render and which inputs, + parameters, and outputs a workflow has. The response used to be a ``WorkflowInvokeRequest`` + (a REQUEST model carrying response semantics), which nested the schemas under + ``data.revision.data.schemas`` and made every client guess the envelope. This model is the + explicit response contract instead: + + - ``revision`` is a :class:`WorkflowRevisionData` directly (it already owns ``uri`` / ``url`` + / ``headers`` / ``schemas`` / ``parameters``), so the schemas live at the obvious + ``response.revision.schemas`` — no ``data.revision.data`` nesting. + - ``configuration`` and ``meta`` carry the resolved config and any interface metadata (the + agent workflow rides its per-harness connection capability in ``meta``). + + Typed outputs (POC, no back-compat field): ``revision.schemas.outputs`` may be keyed per + output type (for example ``{"messages": {...}, "invoke": {...}}``) so a workflow with more + than one output surface describes each one. A single-output workflow still uses a plain + output schema. Consumers read the keyed shape when present and fall back to the plain one. + """ + + version: Optional[str] = "2025.07.14" + + revision: Optional[WorkflowRevisionData] = None + configuration: Optional[Dict[str, Any]] = None + + +# back-compat alias +WorkflowServiceInspectResponse = WorkflowInspectResponse + + class WorkflowBaseResponse(TraceID, SpanID): version: Optional[str] = "2025.07.14" diff --git a/sdks/python/agenta/sdk/utils/types.py b/sdks/python/agenta/sdk/utils/types.py index d2536917de..70b89613a3 100644 --- a/sdks/python/agenta/sdk/utils/types.py +++ b/sdks/python/agenta/sdk/utils/types.py @@ -11,6 +11,7 @@ from agenta.sdk.agents.dtos import HARNESS_IDENTITIES, SandboxPermission from agenta.sdk.agents.mcp import MCPServerConfig from agenta.sdk.agents.tools import ToolConfig +from agenta.sdk.agents.wire_models import run_contract_schemas from agenta.sdk.utils.assets import supported_llm_models, model_metadata from agenta.sdk.utils.helpers import _PLACEHOLDER_RE from agenta.sdk.utils.rendering import ( @@ -1360,4 +1361,9 @@ class _SkillEmbedRefSchema(BaseModel): SkillConfigSchema.ag_type(): _dereference_schema( SkillConfigSchema.model_json_schema() ), + # The `/run` wire contract (request + result), exported from the dedicated Pydantic wire + # models in `agenta.sdk.agents.wire_models`. This puts the service<->runner wire interface in + # the SDK the same way the other catalog types are exposed; a freshness test asserts these + # entries match a fresh export so the schema cannot drift from the models. + **run_contract_schemas(), } diff --git a/sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py b/sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py new file mode 100644 index 0000000000..095c509586 --- /dev/null +++ b/sdks/python/oss/tests/pytest/unit/agents/test_wire_models.py @@ -0,0 +1,121 @@ +"""The ``/run`` wire models are the single schema source of truth. + +These tests prove the dedicated Pydantic wire models in ``agenta.sdk.agents.wire_models`` +faithfully describe the wire that ``request_to_wire`` / ``result_from_wire`` (in ``utils/wire.py``) +produce and parse, and that the exported JSON Schema is the one shipped in the SDK through +``CATALOG_TYPES``. + +This is the Python-side guard the project plan calls for: + +- The exported schema is committed into the SDK (``CATALOG_TYPES['run_request' | 'run_result']``) + and a freshness test asserts the catalog entry equals a fresh export, so the schema cannot + silently drift from the models. +- The golden fixtures (the cross-language anchor) validate against the exported schema — an + "example must validate" check that proves the schema describes today's wire. +- ``request_to_wire`` output validates against the schema, so the producer and the schema agree. +- The request schema's property set equals the hand-kept ``KNOWN_REQUEST_KEYS`` in + ``test_wire_contract.py``, so a new wire field cannot land in one place and not the other. + +There is NO runtime validation in this phase (per the project plan): nothing here gates a live +``/run``. These models are an SDK-side schema artifact and a test guard only. +""" + +from __future__ import annotations + +import jsonschema +import pytest + +from agenta.sdk.agents.wire_models import ( + WireRunRequest, + WireRunResult, + run_contract_schemas, +) +from agenta.sdk.utils.types import CATALOG_TYPES + +from .test_wire_contract import ( + KNOWN_REQUEST_KEYS, + _agenta_payload, + _claude_payload, + _pi_payload, +) + + +def test_run_contract_ships_in_the_sdk_catalog(): + # The exported JSON interface lives in the SDK alongside the other catalog types, so a + # client / the playground / `/inspect` can resolve it the same way as `agent_config`. + assert "run_request" in CATALOG_TYPES + assert "run_result" in CATALOG_TYPES + assert CATALOG_TYPES["run_request"]["x-ag-type"] == "run_request" + assert CATALOG_TYPES["run_result"]["x-ag-type"] == "run_result" + + +def test_committed_catalog_matches_a_fresh_export(): + # Freshness: regenerate the schema in-memory and assert the committed catalog entry equals + # it (drift -> fail), the same discipline the goldens already use. If the wire models change, + # this fails until the export is regenerated. + fresh = run_contract_schemas() + assert CATALOG_TYPES["run_request"] == fresh["run_request"] + assert CATALOG_TYPES["run_result"] == fresh["run_result"] + + +def test_exported_schema_is_dereferenced_and_camelcase(): + # The exported schema is self-contained (no `$defs`/`$ref`, like every catalog entry) and + # speaks the camelCase wire keys, not the snake_case Python field names. + req = CATALOG_TYPES["run_request"] + assert "$defs" not in req + props = req["properties"] + assert "sessionId" in props and "session_id" not in props + assert "customTools" in props and "custom_tools" not in props + + +def test_request_schema_properties_equal_known_request_keys(): + # The schema-derived property set is exactly the hand-kept guard in `test_wire_contract.py`. + # This is the schema-derived key guard: a new wire field cannot be added to one without the + # other, so the two cannot silently fall out of step. + assert set(CATALOG_TYPES["run_request"]["properties"]) == KNOWN_REQUEST_KEYS + + +@pytest.mark.parametrize( + "golden_name, model", + [ + ("run_request.pi_core.json", WireRunRequest), + ("run_request.claude.json", WireRunRequest), + ("run_result.ok.json", WireRunResult), + ("run_result.error.json", WireRunResult), + ], +) +def test_goldens_parse_into_the_wire_models(golden, golden_name, model): + # Every golden parses cleanly into its wire model (by camelCase alias), proving the models + # accept the real wire. The ok-result golden includes a deliberately typeless event; the + # open `WireAgentEvent` (type optional) tolerates it, mirroring the parser's drop behavior. + model.model_validate(golden(golden_name)) + + +@pytest.mark.parametrize( + "golden_name, ag_type", + [ + ("run_request.pi_core.json", "run_request"), + ("run_request.claude.json", "run_request"), + ("run_result.ok.json", "run_result"), + ("run_result.error.json", "run_result"), + ], +) +def test_goldens_validate_against_the_exported_schema(golden, golden_name, ag_type): + # "Examples must validate": each golden validates against the exported JSON Schema shipped in + # the SDK. This proves the schema describes today's wire. It is a TEST, not a runtime guard. + jsonschema.validate(golden(golden_name), CATALOG_TYPES[ag_type]) + + +def test_request_to_wire_output_validates_against_the_schema(): + # The producer and the schema agree: the dict `request_to_wire` builds for each harness + # validates against the exported request schema and round-trips through the wire model. + for payload in (_pi_payload(), _claude_payload(), _agenta_payload()): + jsonschema.validate(payload, CATALOG_TYPES["run_request"]) + WireRunRequest.model_validate(payload) + + +def test_minimal_result_validates(): + # A bare success result (the `result_from_wire` minimal case) is valid against the schema. + payload = {"ok": True} + jsonschema.validate(payload, CATALOG_TYPES["run_result"]) + assert WireRunResult.model_validate(payload).ok is True diff --git a/sdks/python/oss/tests/pytest/unit/test_inspect_response.py b/sdks/python/oss/tests/pytest/unit/test_inspect_response.py new file mode 100644 index 0000000000..b4fdff8766 --- /dev/null +++ b/sdks/python/oss/tests/pytest/unit/test_inspect_response.py @@ -0,0 +1,106 @@ +"""The ``/inspect`` response is the canonical :class:`WorkflowInspectResponse`. + +Architecture-followups issue 1: ``/inspect`` used to return a ``WorkflowInvokeRequest`` (a +REQUEST model carrying response semantics), nesting the resolved interface at +``data.revision.data.schemas`` so every client had to guess the envelope. ``handle_inspect_success`` +now normalizes that internally-built request into a flat :class:`WorkflowInspectResponse` whose +``revision`` IS the :class:`WorkflowRevisionData`, so schemas live at the obvious +``response["revision"]["schemas"]``. + +These are the acceptance criteria from +``docs/design/agent-workflows/interfaces/architecture-followups.md`` issue 1: + +- The response exposes schemas at ``response["revision"]["schemas"]`` (not ``data.revision.data``). +- The frontend can resolve schemas from the new shape. +""" + +from __future__ import annotations + +import json + +from agenta.sdk.decorators.routing import _to_inspect_response +from agenta.sdk.models.workflows import ( + WorkflowInspectResponse, + WorkflowInvokeRequest, + WorkflowRequestData, + WorkflowRevision, + WorkflowRevisionData, +) + +_RESOLVED_REVISION = WorkflowRevisionData( + uri="agenta:builtin:agent:v0", + schemas={ + "inputs": {"type": "object", "properties": {"messages": {"type": "array"}}}, + "parameters": {"type": "object"}, + # Typed outputs keyed per output surface (issue 4): the POC shape, no flat field. + "outputs": { + "invoke": {"x-ag-type-ref": "message", "type": "object"}, + "messages": {"x-ag-type-ref": "messages", "type": "array"}, + }, + }, + parameters={"agent": {"model": "gpt-5.5"}}, +) + + +def _built_invoke_request() -> WorkflowInvokeRequest: + """The internally-built inspect result (what ``workflow.inspect()`` returns today).""" + return WorkflowInvokeRequest( + meta={"harness_capabilities": {"pi_core": {}}}, + data=WorkflowRequestData( + revision=WorkflowRevision( + id=None, + slug="agent", + version="v0", + name="Agent", + data=_RESOLVED_REVISION, + ).model_dump(mode="json", exclude_none=True), + ), + ) + + +def test_inspect_response_lifts_revision_to_top_level(): + response = _to_inspect_response(_built_invoke_request()) + + assert isinstance(response, WorkflowInspectResponse) + assert response.revision is not None + # Schemas live at response.revision.schemas — not nested under data.revision.data. + assert response.revision.schemas is not None + assert response.revision.schemas.inputs == _RESOLVED_REVISION.schemas.inputs + assert response.revision.uri == "agenta:builtin:agent:v0" + assert response.revision.parameters == {"agent": {"model": "gpt-5.5"}} + # Resolved config is preserved at the public boundary, not dropped. + assert response.configuration == {"parameters": {"agent": {"model": "gpt-5.5"}}} + # Interface metadata rides top-level meta. + assert response.meta == {"harness_capabilities": {"pi_core": {}}} + + +def test_inspect_response_serializes_schemas_at_revision_schemas(): + # The acceptance criterion in the words of a client: post /inspect, read response body, + # find schemas at body["revision"]["schemas"]. This is the exact path the frontend reads. + response = _to_inspect_response(_built_invoke_request()) + body = json.loads(response.model_dump_json(exclude_none=True)) + + assert "revision" in body + assert "schemas" in body["revision"] + assert "inputs" in body["revision"]["schemas"] + # No request-envelope leakage: there is no top-level `data.revision.data` nesting. + assert "data" not in body + + +def test_inspect_response_outputs_are_keyed_per_surface(): + # Issue 4: outputs carry the typed shape keyed per output surface (messages / invoke). + response = _to_inspect_response(_built_invoke_request()) + outputs = response.revision.schemas.outputs + + assert set(outputs) == {"invoke", "messages"} + assert outputs["invoke"]["x-ag-type-ref"] == "message" + assert outputs["messages"]["x-ag-type-ref"] == "messages" + + +def test_inspect_response_handles_a_request_with_no_revision(): + # A built request with no resolved revision normalizes to an empty-revision response, not a + # crash (the inspect path can resolve nothing for an unknown URI). + response = _to_inspect_response(WorkflowInvokeRequest()) + assert isinstance(response, WorkflowInspectResponse) + assert response.revision is None + assert response.configuration is None diff --git a/services/oss/src/agent/schemas.py b/services/oss/src/agent/schemas.py index 5a49a38a93..ccd2bb1f7b 100644 --- a/services/oss/src/agent/schemas.py +++ b/services/oss/src/agent/schemas.py @@ -67,12 +67,28 @@ "properties": {"agent": AGENT_CONFIG_SCHEMA}, } -# Outputs: the final assistant message. +# Outputs, keyed per output surface (the agent has two): `invoke` returns the single final +# assistant message (the batch `/invoke` shape, `x-ag-type-ref: message`); `messages` returns the +# ordered conversation the `/messages` route streams (`x-ag-type-ref: messages`). Keying outputs by +# surface lets the playground render the right output view per route. POC, so no flat back-compat +# output field: a consumer reads the keyed shape directly. Both refs already appear elsewhere in +# AGENT_SCHEMAS, so this adds no new catalog marker. AGENT_OUTPUTS_SCHEMA = { "$schema": _SCHEMA, - "x-ag-type-ref": "message", "type": "object", - "description": "Final assistant message returned by the agent.", + "description": "Agent outputs, keyed per output surface (invoke / messages).", + "properties": { + "invoke": { + "x-ag-type-ref": "message", + "type": "object", + "description": "Final assistant message returned by a batch /invoke.", + }, + "messages": { + "x-ag-type-ref": "messages", + "type": "array", + "description": "The ordered conversation the /messages route returns.", + }, + }, } AGENT_SCHEMAS = { diff --git a/web/packages/agenta-entities/src/workflow/api/api.ts b/web/packages/agenta-entities/src/workflow/api/api.ts index 44ee390dae..c8b5a4be47 100644 --- a/web/packages/agenta-entities/src/workflow/api/api.ts +++ b/web/packages/agenta-entities/src/workflow/api/api.ts @@ -415,12 +415,19 @@ export async function fetchWorkflowRevisionById( // ============================================================================ /** - * Response shape from the inspect endpoint. - * Returns a WorkflowServiceRequest with resolved interface. + * Response shape from the `/inspect` endpoint. + * + * The canonical backend model is `WorkflowInspectResponse` + * (sdks/python/agenta/sdk/models/workflows.py): a flat response whose `revision` IS the + * resolved `WorkflowRevisionData`, so schemas live at `revision.schemas`. The endpoint no + * longer returns the old `WorkflowInvokeRequest` envelope that nested them under + * `data.revision.data.schemas`. + * + * `outputs` is typed per output surface (POC): `{invoke, messages}` for the agent workflow, + * or a single schema for a one-output workflow. The store reads either shape. */ export interface InspectWorkflowResponse { version?: string - /** New shape (feat/extend-runnables): revision contains the resolved data */ revision?: { uri?: string url?: string @@ -432,7 +439,14 @@ export interface InspectWorkflowResponse { } parameters?: Record } - /** @deprecated Old shape — kept for backward compat during migration */ + configuration?: Record + meta?: Record + /** + * @deprecated Migration bridge for the old `WorkflowInvokeRequest` inspect envelope. The + * canonical response puts schemas at `revision.schemas`; read that first. Remove this once + * every reader (appUtils / evaluatorUtils) no longer needs the `?? interface?.schemas` + * fallback — i.e. once no deployed service returns the old envelope. + */ interface?: { version?: string uri?: string @@ -444,10 +458,6 @@ export interface InspectWorkflowResponse { outputs?: Record } } - configuration?: { - script?: Record - parameters?: Record - } } /** diff --git a/web/packages/agenta-entities/src/workflow/state/store.ts b/web/packages/agenta-entities/src/workflow/state/store.ts index f7fc3d344b..83531a1cb8 100644 --- a/web/packages/agenta-entities/src/workflow/state/store.ts +++ b/web/packages/agenta-entities/src/workflow/state/store.ts @@ -1535,11 +1535,13 @@ export const workflowEntityAtomFamily = atomFamily((workflowId: string) => let resolvedParams: Record | null | undefined = null // (a) Inspect — primary source for any workflow with a URI. - // Returns interface.schemas.{inputs, parameters, outputs} directly. + // The canonical `WorkflowInspectResponse` puts the resolved interface at + // `revision.schemas.{inputs, parameters, outputs}` (revision IS the WorkflowRevisionData), + // so we read it directly. `outputs` may be typed per output surface ({invoke, messages}). const inspectQuery = get(workflowInspectAtomFamily(workflowId)) const inspectData = inspectQuery.data ?? null if (inspectData) { - const inspectSchemas = inspectData.revision?.schemas ?? inspectData.interface?.schemas + const inspectSchemas = inspectData.revision?.schemas if (inspectSchemas) { resolvedInputs = inspectSchemas.inputs resolvedOutputs = inspectSchemas.outputs diff --git a/web/packages/agenta-entities/tests/unit/inspectResponseSchemaResolution.test.ts b/web/packages/agenta-entities/tests/unit/inspectResponseSchemaResolution.test.ts new file mode 100644 index 0000000000..411676de6c --- /dev/null +++ b/web/packages/agenta-entities/tests/unit/inspectResponseSchemaResolution.test.ts @@ -0,0 +1,83 @@ +/** + * Schema resolution from the canonical `/inspect` response shape. + * + * Architecture-followups issue 1: `/inspect` now returns the canonical `WorkflowInspectResponse` + * (sdks/python/agenta/sdk/models/workflows.py) whose `revision` IS the resolved + * `WorkflowRevisionData`, so schemas live at `revision.schemas`. The store read + * (`web/packages/agenta-entities/src/workflow/state/store.ts`, the inspect branch) reads exactly + * that path. These tests pin that read against the real response shape so it cannot silently + * regress to resolving `undefined` (the latent break this fix closes). + * + * The store's read is an inline expression over the query data, not an exported function, so we + * reproduce that exact expression here over a typed `InspectWorkflowResponse`. The point is the + * CONTRACT: the canonical body resolves schemas; the old nested envelope does not. + */ + +import {describe, expect, it} from "vitest" + +import type {InspectWorkflowResponse} from "../../src/workflow/api/api" + +// The exact read the store performs in its inspect branch (store.ts). +function resolveInspectSchemas(inspectData: InspectWorkflowResponse | null) { + if (!inspectData) return null + const inspectSchemas = inspectData.revision?.schemas + if (!inspectSchemas) return null + return { + inputs: inspectSchemas.inputs, + outputs: inspectSchemas.outputs, + parameters: inspectSchemas.parameters, + } +} + +describe("inspect response schema resolution", () => { + it("resolves schemas from the canonical revision.schemas shape", () => { + const body: InspectWorkflowResponse = { + version: "2025.07.14", + revision: { + uri: "agenta:builtin:agent:v0", + schemas: { + inputs: {type: "object", properties: {messages: {type: "array"}}}, + parameters: {type: "object"}, + outputs: { + invoke: {"x-ag-type-ref": "message", type: "object"}, + messages: {"x-ag-type-ref": "messages", type: "array"}, + }, + }, + parameters: {agent: {model: "gpt-5.5"}}, + }, + meta: {harness_capabilities: {}}, + } + + const resolved = resolveInspectSchemas(body) + expect(resolved).not.toBeNull() + expect(resolved?.inputs).toEqual({ + type: "object", + properties: {messages: {type: "array"}}, + }) + expect(resolved?.parameters).toEqual({type: "object"}) + }) + + it("exposes outputs keyed per output surface (invoke / messages)", () => { + const body: InspectWorkflowResponse = { + revision: { + schemas: { + outputs: { + invoke: {"x-ag-type-ref": "message"}, + messages: {"x-ag-type-ref": "messages"}, + }, + }, + }, + } + + const resolved = resolveInspectSchemas(body) + const outputs = resolved?.outputs as Record> | undefined + expect(outputs && Object.keys(outputs).sort()).toEqual(["invoke", "messages"]) + expect(outputs?.invoke["x-ag-type-ref"]).toBe("message") + expect(outputs?.messages["x-ag-type-ref"]).toBe("messages") + }) + + it("resolves nothing when there is no revision (no crash, no stale schemas)", () => { + expect(resolveInspectSchemas({})).toBeNull() + expect(resolveInspectSchemas(null)).toBeNull() + }) +})