Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion docs/design/agent-workflows/documentation/agent-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,16 @@ existing control:

- `agents_md` renders as a multiline text input labeled "Instructions". It falls back to a
legacy `instructions` value when `agents_md` is missing.
- `model` renders as a grouped choice control.
- `model` renders as a unified, harness-filtered provider + model picker. The options come from
the agent's `/inspect` `meta.harness_capabilities[harness].models` (Pi: the vault providers'
catalog ids; Claude: its aliases), not the full shared catalog. Selecting a model sets BOTH the
model id and its provider, so there is no separate provider field. When `/inspect` publishes no
per-harness models (older agents / a standalone control), it falls back to the schema's full
grouped-choice catalog. Below the picker, an **Authentication** toggle chooses *Agenta-managed*
(a vault connection — "Project default" or a named connection picked from `GET /secrets/`) vs
*Self-managed* (the harness uses its own login; Agenta injects nothing). The form always writes
`model` as a structured `ModelRef` (`{provider, model, connection?}`), never a free-text string;
the connection rides in `model_ref.connection`. A raw-JSON escape hatch remains for power users.
- `tools` renders as a flat array. Each entry uses `ToolItemControl`, the same tool object
shape the prompt control uses.
- `mcp_servers` renders as a flat array. Each entry uses `McpServerItemControl`, which is a
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ workflow provides it:
```jsonc
{
"version": "2025.07.14",
"meta": { "harness_capabilities": { /* per-harness provider/deployment limits */ } },
"meta": { "harness_capabilities": { /* per-harness providers, deployments, connection_modes, model_selection, models */ } },
"data": {
"revision": {
"data": {
Expand All @@ -38,6 +38,15 @@ the form and the schema stay in one place. The `meta.harness_capabilities` block
table the service uses server-side to reject unreachable provider and deployment choices, so
the form can filter stored connections before the user submits when that metadata is present.

Per harness, the block carries `providers`, `deployments`, `connection_modes`, `model_selection`,
and `models`. `models` is a provider-keyed map of the selectable models the harness can reach: Pi
publishes each vault provider's catalog ids; Claude publishes its alias set (`default`/`sonnet`/
`opus`/`haiku` and their `[1m]` variants) under `anthropic`. The agent playground renders its
harness-filtered provider + model picker straight from this map instead of the full shared
catalog, and uses `model_selection` to interpret a value (`provider/id` for Pi vs `alias` for
Claude). The table is published by `harness_capabilities_document()` in
`sdks/python/agenta/sdk/agents/capabilities.py`.

The shape of the config itself lives in [Agent config
schema](agent-config-schema.md). This page covers what `/inspect` returns; that page covers
the fields.
Expand Down
54 changes: 54 additions & 0 deletions docs/design/agent-workflows/projects/agent-model-picker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Agent playground: provider + model picker (harness-aware, inspect-driven)

Make the **agent** playground pick a **provider + model** the way the completion/chat playground
does, but **filtered to what the selected harness can actually reach**, with an explicit
**Agenta-auth vs self-managed** choice and a **connection picker** for the credential. The
per-harness reach (providers, models, connection modes) is published by the agent's `/inspect`
response and the frontend renders from it.

## The shape in one paragraph

`/inspect` already publishes `meta.harness_capabilities` (per harness: `providers`,
`deployments`, `connection_modes`, `model_selection`). This project **adds a per-provider model
list** to that surface (Pi: the vault-reachable providers' model ids; Claude: its alias list), and
**rewires the frontend to render from `/inspect`** instead of the current static hardcoded copy. The
model picker becomes one unified control (selecting a model sets both provider and model id),
filtered to the harness. The credential is chosen with a clear **Authentication** toggle — *Agenta*
(managed: pick the project-default or a named connection from the vault) or *Self-managed* (the
harness uses its own login; Agenta injects nothing). The connection rides in `model_ref.connection`
inside the config the playground already sends; no new request field and no new vault route.

## What already exists (do not rebuild)

The parent [../provider-model-auth/](../provider-model-auth/) project (PR #4815, **merged** to
`big-agents`) already shipped the backend and a *minimal* form:

- `ModelRef` (`provider` + `model` + `params` + `connection`) in the agent config, coerced from a
bare string for back-compat.
- A connection resolver that reads the existing `GET /secrets/` and injects **one** least-privilege
credential (replacing the whole-vault dump).
- `/inspect` `meta.harness_capabilities` (the per-harness `providers`/`deployments`/
`connection_modes`/`model_selection` table) + server-side fail-loud reject.
- A minimal connection form: a grouped model picker (unfiltered), a separate free-text/select
Provider field, a connection-mode select, a free-text slug, and a raw-JSON escape hatch.

This project is the **playground UX + the inspect model-list addition** that the parent explicitly
deferred.

## Read in this order

1. [context.md](context.md): why this exists, the exact merged state with `file:line`, goals,
non-goals, the three decisions taken.
2. [research.md](research.md): the precise findings (inspect, the capability table, the model
picker, the connection form, vault listing, the completion/chat pattern), with citations.
3. [plan.md](plan.md): the phased slices, backend through frontend, with the test strategy.
4. [status.md](status.md): current state, decisions, open items, risks. Source of truth.

## Related work

- [../provider-model-auth/](../provider-model-auth/): the backend `ModelRef`/connection resolver
and the minimal form this project builds on.
- [../model-config/](../model-config/): how a requested model becomes settable on each harness (the
Pi `auth.json`/`models.json` write, the custom-endpoint consumption). Out of scope here.
- [../harness-capabilities/](../harness-capabilities/): owns the general capability-table mechanism;
this project extends the `providers`/`models` entries it consumes.
106 changes: 106 additions & 0 deletions docs/design/agent-workflows/projects/agent-model-picker/context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Context

## Why this exists

The agent (harness) playground should let a user pick a **provider + model** like the completion and
chat playgrounds do, but the agent case has an extra constraint the prompt case does not: **each
harness can only reach some providers and models**. Claude Code reaches Anthropic only and selects by
alias; Pi reaches eight vault-mapped providers and selects by `provider/id`. The picker must filter
to the selected harness, and that per-harness reach must come from the agent itself (`/inspect`), not
a list hardcoded in the frontend. The user also needs to choose **whether Agenta supplies the
credential** (managed) or the **harness brings its own login** (self-managed), and when managed, to
pick *which* stored connection.

## Current state (merged on `big-agents`, PR #4815)

The backend and a minimal form already landed. The remaining work is UX + one inspect addition.

### Model selection
- The agent config model field renders through the **same grouped picker** as completion/chat:
`AgentConfigControl` -> `GroupedChoiceControl` -> `SelectLLMProviderBase`
(`web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx:342-349`).
- The picker's choices come from the **whole** shared LiteLLM catalog, **unfiltered by harness**:
`_model_catalog_type()` deep-copies `supported_llm_models` as grouped `choices`
(`sdks/python/agenta/sdk/utils/types.py:1046-1055`), registered as `CATALOG_TYPES["model"]`
(`types.py:1321`). The agent field declares `x-parameter: "grouped_choice"`
(`types.py:1088-1093`). Catalog source: `sdks/python/agenta/sdk/utils/assets.py:6-193`
(`supported_llm_models`, provider -> prefixed ids like `anthropic/claude-opus-4-7`).
- A **second, redundant** free-text/select "Provider" field sits in the connection section
(`AgentConfigControl.tsx:355-380`), disjoint from the model picker.

### Per-harness reach is already in `/inspect`
- `/inspect` publishes `meta.harness_capabilities` via `harness_capabilities_document()`
(`services/oss/src/agent/app.py:294-300`). The table is
`sdks/python/agenta/sdk/agents/capabilities.py` with, per harness:
`providers`, `deployments`, `connection_modes`, `model_selection`
(`capabilities.py:57-95`). It has **no `models`** field.
- Harness types: `pi_core`, `pi_agenta` (both reach the 8 `PI_VAULT_PROVIDERS`, `model_selection
"provider/id"`), `claude` (anthropic only, `model_selection "alias"`)
(`capabilities.py:41-50,76-95`; `HarnessType` `dtos.py:42-58`).
- The agent service uses the same table for a server-side fail-loud reject
(`app.py:84-117`).

### The frontend ignores inspect and uses a static copy
- `connectionUtils.ts` holds a **hardcoded** `HARNESS_CONNECTION_CAPABILITIES`
(`web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts:146-202`) with a
`TODO(harness-capabilities)` to consume `/inspect` `meta.harness_capabilities`. The form filters
the **Provider** select with `allowedProviders(harness)` but never filters the **model** picker.

### Credential (connection)
- The connection rides in the config as `model_ref.connection = {mode, slug}`; the backend coerces a
structured `model` into `ModelRef` (`sdks/python/agenta/sdk/agents/dtos.py:806-831`,
`_parse_agent_fields` `:929-960`). So "the frontend sends the connection" already works.
- Modes: `agenta` (managed) and `self_managed` (`connections/models.py` `Connection.mode`); the
project default is `agenta` with no slug. The form has a mode `Select` and a **free-text** slug
field with a `TODO(provider-model-auth)` to become a picker once a connections endpoint exists
(`AgentConfigControl.tsx:382-421`).
- Resolution reads the existing `GET /secrets/` (no new route): `VaultConnectionResolver`
(`sdks/python/agenta/sdk/agents/platform/connections.py:380-431`). The FE can list connections from
the existing `vaultSecretsQueryAtom` (`web/packages/agenta-entities/src/secret/state/atoms.ts:78-100`).

## Goals

1. Publish a **per-provider model list per harness** in `/inspect` `meta.harness_capabilities`
(Pi: vault-reachable providers' ids; Claude: its aliases). The frontend renders from it.
2. Make the frontend **consume `/inspect`** for the capability map (providers, models, modes),
replacing the static hardcoded copy.
3. **Filter the model picker to the selected harness** and unify it: selecting a model sets both
`provider` and `model`; drop the redundant standalone Provider field.
4. Present a clear **Authentication** choice — *Agenta* (managed) vs *Self-managed* — and, for
Agenta, a **connection picker** (project default or a named connection) fed by the existing vault
list, filtered to the chosen provider.
5. Keep the wire contract and the resolver unchanged; the connection still rides `model_ref.connection`.

## Decisions taken (2026-06-24, with the user)

1. **The per-provider model list is published in `/inspect`.** The backend builds and publishes the
exact per-harness, per-provider model list in `meta.harness_capabilities`; the frontend renders
straight from inspect. (Chosen over filtering the shared catalog client-side. Trade-off: the
model list is duplicated into the capability surface and must be kept fresh; mitigated by sourcing
it from the same `supported_llm_models` catalog on the backend.)
2. **"Not Agenta authentication" means self-managed login only.** The harness uses its own credential
in the sandbox (env var or prior OAuth login); Agenta injects nothing. No per-run pasted-key
channel is added (matches the connection design and completion/chat).
3. **Claude is presented as an alias dropdown** (default, sonnet, opus, haiku, and `[1m]` variants),
matching `model_selection: "alias"`. The alias list is added to `/inspect`.

## Non-goals (v1)

- A new vault storage model, write path, CRUD, or a new connections route. v1 reads `GET /secrets/`.
- A per-run pasted API key / inline-secret channel.
- Where a deployed agent's durable per-environment default connection lives (a parent open
decision; the config-stored path is unaffected).
- Migrating the completion/prompt path onto the agent resolver. Completions keep their reader.
- Pi consuming custom endpoints / cloud deployments (Azure/Bedrock/Vertex) — owned by
[../model-config/](../model-config/); v1 stays `direct`/fail-loud.

## Constraints inherited from the codebase

- `/inspect` `meta.harness_capabilities` must stay a plain JSON-able dict (no model import on the
consumer side) — `harness_capabilities_document()` (`capabilities.py:98-108`).
- The SDK owns the capability table; the agent service imports it; the SDK must not import the
service (`../../documentation/ports-and-adapters.md`).
- Frontend API calls go through the Fern client + a zod boundary (`web/CLAUDE.md`). The capability
map arrives inside the inspect/workflow schema response, not a new endpoint.
- Any wire change updates Python (`utils/wire.py`) and TypeScript (`services/agent/src/protocol.ts`)
with the golden tests in one PR. This project's inspect-meta change is **not** a `/run` wire change.
Loading
Loading