Agenta-AI · mmabrouk · Jun 25, 2026 · Jun 24, 2026 · Jun 24, 2026 · Jun 24, 2026
diff --git a/docs/design/agent-workflows/documentation/agent-configuration.md b/docs/design/agent-workflows/documentation/agent-configuration.md
@@ -54,7 +54,16 @@ existing control:
 
 - `agents_md` renders as a multiline text input labeled "Instructions". It falls back to a
   legacy `instructions` value when `agents_md` is missing.
-- `model` renders as a grouped choice control.
+- `model` renders as a unified, harness-filtered provider + model picker. The options come from
+  the agent's `/inspect` `meta.harness_capabilities[harness].models` (Pi: the vault providers'
+  catalog ids; Claude: its aliases), not the full shared catalog. Selecting a model sets BOTH the
+  model id and its provider, so there is no separate provider field. When `/inspect` publishes no
+  per-harness models (older agents / a standalone control), it falls back to the schema's full
+  grouped-choice catalog. Below the picker, an **Authentication** toggle chooses *Agenta-managed*
+  (a vault connection — "Project default" or a named connection picked from `GET /secrets/`) vs
+  *Self-managed* (the harness uses its own login; Agenta injects nothing). The form always writes
+  `model` as a structured `ModelRef` (`{provider, model, connection?}`), never a free-text string;
+  the connection rides in `model_ref.connection`. A raw-JSON escape hatch remains for power users.
 - `tools` renders as a flat array. Each entry uses `ToolItemControl`, the same tool object
   shape the prompt control uses.
 - `mcp_servers` renders as a flat array. Each entry uses `McpServerItemControl`, which is a

diff --git a/docs/design/agent-workflows/interfaces/public-edge/workflow-inspect.md b/docs/design/agent-workflows/interfaces/public-edge/workflow-inspect.md
@@ -14,7 +14,7 @@ workflow provides it:
 ```jsonc
 {
   "version": "2025.07.14",
-  "meta": { "harness_capabilities": { /* per-harness provider/deployment limits */ } },
+  "meta": { "harness_capabilities": { /* per-harness providers, deployments, connection_modes, model_selection, models */ } },
   "data": {
     "revision": {
       "data": {
@@ -38,6 +38,15 @@ the form and the schema stay in one place. The `meta.harness_capabilities` block
 table the service uses server-side to reject unreachable provider and deployment choices, so
 the form can filter stored connections before the user submits when that metadata is present.
 
+Per harness, the block carries `providers`, `deployments`, `connection_modes`, `model_selection`,
+and `models`. `models` is a provider-keyed map of the selectable models the harness can reach: Pi
+publishes each vault provider's catalog ids; Claude publishes its alias set (`default`/`sonnet`/
+`opus`/`haiku` and their `[1m]` variants) under `anthropic`. The agent playground renders its
+harness-filtered provider + model picker straight from this map instead of the full shared
+catalog, and uses `model_selection` to interpret a value (`provider/id` for Pi vs `alias` for
+Claude). The table is published by `harness_capabilities_document()` in
+`sdks/python/agenta/sdk/agents/capabilities.py`.
+
 The shape of the config itself lives in [Agent config
 schema](agent-config-schema.md). This page covers what `/inspect` returns; that page covers
 the fields.

diff --git a/docs/design/agent-workflows/projects/agent-model-picker/README.md b/docs/design/agent-workflows/projects/agent-model-picker/README.md
@@ -0,0 +1,54 @@
+# Agent playground: provider + model picker (harness-aware, inspect-driven)
+
+Make the **agent** playground pick a **provider + model** the way the completion/chat playground
+does, but **filtered to what the selected harness can actually reach**, with an explicit
+**Agenta-auth vs self-managed** choice and a **connection picker** for the credential. The
+per-harness reach (providers, models, connection modes) is published by the agent's `/inspect`
+response and the frontend renders from it.
+
+## The shape in one paragraph
+
+`/inspect` already publishes `meta.harness_capabilities` (per harness: `providers`,
+`deployments`, `connection_modes`, `model_selection`). This project **adds a per-provider model
+list** to that surface (Pi: the vault-reachable providers' model ids; Claude: its alias list), and
+**rewires the frontend to render from `/inspect`** instead of the current static hardcoded copy. The
+model picker becomes one unified control (selecting a model sets both provider and model id),
+filtered to the harness. The credential is chosen with a clear **Authentication** toggle — *Agenta*
+(managed: pick the project-default or a named connection from the vault) or *Self-managed* (the
+harness uses its own login; Agenta injects nothing). The connection rides in `model_ref.connection`
+inside the config the playground already sends; no new request field and no new vault route.
+
+## What already exists (do not rebuild)
+
+The parent [../provider-model-auth/](../provider-model-auth/) project (PR #4815, **merged** to
+`big-agents`) already shipped the backend and a *minimal* form:
+
+- `ModelRef` (`provider` + `model` + `params` + `connection`) in the agent config, coerced from a
+  bare string for back-compat.
+- A connection resolver that reads the existing `GET /secrets/` and injects **one** least-privilege
+  credential (replacing the whole-vault dump).
+- `/inspect` `meta.harness_capabilities` (the per-harness `providers`/`deployments`/
+  `connection_modes`/`model_selection` table) + server-side fail-loud reject.
+- A minimal connection form: a grouped model picker (unfiltered), a separate free-text/select
+  Provider field, a connection-mode select, a free-text slug, and a raw-JSON escape hatch.
+
+This project is the **playground UX + the inspect model-list addition** that the parent explicitly
+deferred.
+
+## Read in this order
+
+1. [context.md](context.md): why this exists, the exact merged state with `file:line`, goals,
+   non-goals, the three decisions taken.
+2. [research.md](research.md): the precise findings (inspect, the capability table, the model
+   picker, the connection form, vault listing, the completion/chat pattern), with citations.
+3. [plan.md](plan.md): the phased slices, backend through frontend, with the test strategy.
+4. [status.md](status.md): current state, decisions, open items, risks. Source of truth.
+
+## Related work
+
+- [../provider-model-auth/](../provider-model-auth/): the backend `ModelRef`/connection resolver
+  and the minimal form this project builds on.
+- [../model-config/](../model-config/): how a requested model becomes settable on each harness (the
+  Pi `auth.json`/`models.json` write, the custom-endpoint consumption). Out of scope here.
+- [../harness-capabilities/](../harness-capabilities/): owns the general capability-table mechanism;
+  this project extends the `providers`/`models` entries it consumes.
diff --git a/docs/design/agent-workflows/projects/agent-model-picker/context.md b/docs/design/agent-workflows/projects/agent-model-picker/context.md
@@ -0,0 +1,106 @@
+# Context
+
+## Why this exists
+
+The agent (harness) playground should let a user pick a **provider + model** like the completion and
+chat playgrounds do, but the agent case has an extra constraint the prompt case does not: **each
+harness can only reach some providers and models**. Claude Code reaches Anthropic only and selects by
+alias; Pi reaches eight vault-mapped providers and selects by `provider/id`. The picker must filter
+to the selected harness, and that per-harness reach must come from the agent itself (`/inspect`), not
+a list hardcoded in the frontend. The user also needs to choose **whether Agenta supplies the
+credential** (managed) or the **harness brings its own login** (self-managed), and when managed, to
+pick *which* stored connection.
+
+## Current state (merged on `big-agents`, PR #4815)
+
+The backend and a minimal form already landed. The remaining work is UX + one inspect addition.
+
+### Model selection
+- The agent config model field renders through the **same grouped picker** as completion/chat:
+  `AgentConfigControl` -> `GroupedChoiceControl` -> `SelectLLMProviderBase`
+  (`web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx:342-349`).
+- The picker's choices come from the **whole** shared LiteLLM catalog, **unfiltered by harness**:
+  `_model_catalog_type()` deep-copies `supported_llm_models` as grouped `choices`
+  (`sdks/python/agenta/sdk/utils/types.py:1046-1055`), registered as `CATALOG_TYPES["model"]`
+  (`types.py:1321`). The agent field declares `x-parameter: "grouped_choice"`
+  (`types.py:1088-1093`). Catalog source: `sdks/python/agenta/sdk/utils/assets.py:6-193`
+  (`supported_llm_models`, provider -> prefixed ids like `anthropic/claude-opus-4-7`).
+- A **second, redundant** free-text/select "Provider" field sits in the connection section
+  (`AgentConfigControl.tsx:355-380`), disjoint from the model picker.
+
+### Per-harness reach is already in `/inspect`
+- `/inspect` publishes `meta.harness_capabilities` via `harness_capabilities_document()`
+  (`services/oss/src/agent/app.py:294-300`). The table is
+  `sdks/python/agenta/sdk/agents/capabilities.py` with, per harness:
+  `providers`, `deployments`, `connection_modes`, `model_selection`
+  (`capabilities.py:57-95`). It has **no `models`** field.
+- Harness types: `pi_core`, `pi_agenta` (both reach the 8 `PI_VAULT_PROVIDERS`, `model_selection
+  "provider/id"`), `claude` (anthropic only, `model_selection "alias"`)
+  (`capabilities.py:41-50,76-95`; `HarnessType` `dtos.py:42-58`).
+- The agent service uses the same table for a server-side fail-loud reject
+  (`app.py:84-117`).
+
+### The frontend ignores inspect and uses a static copy
+- `connectionUtils.ts` holds a **hardcoded** `HARNESS_CONNECTION_CAPABILITIES`
+  (`web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts:146-202`) with a
+  `TODO(harness-capabilities)` to consume `/inspect` `meta.harness_capabilities`. The form filters
+  the **Provider** select with `allowedProviders(harness)` but never filters the **model** picker.
+
+### Credential (connection)
+- The connection rides in the config as `model_ref.connection = {mode, slug}`; the backend coerces a
+  structured `model` into `ModelRef` (`sdks/python/agenta/sdk/agents/dtos.py:806-831`,
+  `_parse_agent_fields` `:929-960`). So "the frontend sends the connection" already works.
+- Modes: `agenta` (managed) and `self_managed` (`connections/models.py` `Connection.mode`); the
+  project default is `agenta` with no slug. The form has a mode `Select` and a **free-text** slug
+  field with a `TODO(provider-model-auth)` to become a picker once a connections endpoint exists
+  (`AgentConfigControl.tsx:382-421`).
+- Resolution reads the existing `GET /secrets/` (no new route): `VaultConnectionResolver`
+  (`sdks/python/agenta/sdk/agents/platform/connections.py:380-431`). The FE can list connections from
+  the existing `vaultSecretsQueryAtom` (`web/packages/agenta-entities/src/secret/state/atoms.ts:78-100`).
+
+## Goals
+
+1. Publish a **per-provider model list per harness** in `/inspect` `meta.harness_capabilities`
+   (Pi: vault-reachable providers' ids; Claude: its aliases). The frontend renders from it.
+2. Make the frontend **consume `/inspect`** for the capability map (providers, models, modes),
+   replacing the static hardcoded copy.
+3. **Filter the model picker to the selected harness** and unify it: selecting a model sets both
+   `provider` and `model`; drop the redundant standalone Provider field.
+4. Present a clear **Authentication** choice — *Agenta* (managed) vs *Self-managed* — and, for
+   Agenta, a **connection picker** (project default or a named connection) fed by the existing vault
+   list, filtered to the chosen provider.
+5. Keep the wire contract and the resolver unchanged; the connection still rides `model_ref.connection`.
+
+## Decisions taken (2026-06-24, with the user)
+
+1. **The per-provider model list is published in `/inspect`.** The backend builds and publishes the
+   exact per-harness, per-provider model list in `meta.harness_capabilities`; the frontend renders
+   straight from inspect. (Chosen over filtering the shared catalog client-side. Trade-off: the
+   model list is duplicated into the capability surface and must be kept fresh; mitigated by sourcing
+   it from the same `supported_llm_models` catalog on the backend.)
+2. **"Not Agenta authentication" means self-managed login only.** The harness uses its own credential
+   in the sandbox (env var or prior OAuth login); Agenta injects nothing. No per-run pasted-key
+   channel is added (matches the connection design and completion/chat).
+3. **Claude is presented as an alias dropdown** (default, sonnet, opus, haiku, and `[1m]` variants),
+   matching `model_selection: "alias"`. The alias list is added to `/inspect`.
+
+## Non-goals (v1)
+
+- A new vault storage model, write path, CRUD, or a new connections route. v1 reads `GET /secrets/`.
+- A per-run pasted API key / inline-secret channel.
+- Where a deployed agent's durable per-environment default connection lives (a parent open
+  decision; the config-stored path is unaffected).
+- Migrating the completion/prompt path onto the agent resolver. Completions keep their reader.
+- Pi consuming custom endpoints / cloud deployments (Azure/Bedrock/Vertex) — owned by
+  [../model-config/](../model-config/); v1 stays `direct`/fail-loud.
+
+## Constraints inherited from the codebase
+
+- `/inspect` `meta.harness_capabilities` must stay a plain JSON-able dict (no model import on the
+  consumer side) — `harness_capabilities_document()` (`capabilities.py:98-108`).
+- The SDK owns the capability table; the agent service imports it; the SDK must not import the
+  service (`../../documentation/ports-and-adapters.md`).
+- Frontend API calls go through the Fern client + a zod boundary (`web/CLAUDE.md`). The capability
+  map arrives inside the inspect/workflow schema response, not a new endpoint.
+- Any wire change updates Python (`utils/wire.py`) and TypeScript (`services/agent/src/protocol.ts`)
+  with the golden tests in one PR. This project's inspect-meta change is **not** a `/run` wire change.