feat(frontend): harness-aware agent provider + model picker (inspect models) by mmabrouk · Pull Request #4839 · Agenta-AI/agenta

mmabrouk · 2026-06-24T22:02:39Z

What & why

The agent playground's model picker showed the whole LiteLLM catalog, unfiltered by harness, with a separate redundant Provider field and a free-text connection slug. So an author could pick a model Claude can't reach, or an OpenAI provider for the Claude harness, and only find out when the run failed. The credential UX was a bare mode dropdown plus a free-text box.

This makes the agent picker harness-aware and inspect-driven: the agent's /inspect now publishes the exact models each harness can reach, and the playground renders one unified provider + model picker from it, with a clear authentication choice and a connection picker fed by the vault.

Before → after

Model picker: full catalog, unfiltered → harness-filtered (only the models this harness can reach), built straight from /inspect. Selecting a model now sets both the provider and the model id; the redundant standalone Provider field is gone.
Capability source: a static hardcoded FE copy of the harness table (with a TODO to consume inspect) → the inspect-published map, threaded through a new atom. Permissive fallback to the full catalog when an agent doesn't publish models (older agents / standalone).
Credential UX: a bare "Connection mode" select + a free-text slug → an Authentication toggle (Agenta-managed vs Self-managed) plus a connection picker (Project default / named connections, read from GET /secrets/). Raw-JSON escape hatch kept.
config.model: could be a free-text string → always a structured ModelRef ({provider, model, connection?}). The picker is the only way to set it. (Folds in [docs] Add agent workflow interface inventory #4821 comment 3469645457.)

How

Phase A — SDK. HarnessConnectionCapabilities gains models: Dict[str, List[str]]; HARNESS_CONNECTION_CAPABILITIES populates it. Pi maps each PI_VAULT_PROVIDERS entry to its supported_llm_models catalog ids (_pi_models() skips a provider missing from the catalog); Claude maps anthropic to a new CLAUDE_MODEL_ALIASES constant (default/sonnet/opus/haiku + [1m] variants). harness_capabilities_document() emits it, so /inspect meta.harness_capabilities carries per-harness models. Not a /run wire change.
Phase B — FE plumbing. New harnessCapabilitiesAtomFamily(revisionId) derives meta.harness_capabilities from the existing workflowInspectAtomFamily. connectionUtils.ts retired the static FE capability map; its helpers now take the inspect-fed HarnessCapabilitiesMap and stay permissive when it's absent.
Phase C — picker. buildModelOptionGroups builds the grouped options from the harness's published models; SelectLLMProviderBase renders them. providerForModel derives the provider from the picked model's group; harnessAllowsModel clears an unreachable model on harness switch.
Phase D — auth. An Authentication toggle maps to connection.mode; namedConnectionOptions lists vault custom-provider connections (the slug the resolver matches on) filtered to the chosen provider + harness. Self-managed sends {mode: "self_managed"} and shows a note.

Tests

SDK capability contract test extended (every harness has a models map; Pi ⊆ supported_llm_models; Claude = the alias set; the document round-trips as plain dicts). green (11 tests in the file; 50 in the connections suite).
connectionUtils unit tests rewritten for the inspect-fed helpers + the always-ModelRef compose + the picker/connection-option helpers. green (24 tests; 126 across entity-ui).
@agenta/entities, @agenta/entity-ui, @agenta/playground typecheck green. ruff + prettier + eslint clean.

Note for reviewers

supported_llm_models["openai"] lists bare ids (gpt-5.5, …) while other providers are prefixed (anthropic/claude-...). The picker uses the catalog id verbatim as the option value and takes the provider from the group key, so the mix is handled; the contract test deliberately does not assert provider-prefixing.

Docs synced in this PR: the /inspect interface inventory entry (now lists models) and documentation/agent-configuration.md (picker + auth UX).

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

vercel · 2026-06-24T22:02:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 25, 2026 11:37am

coderabbitai · 2026-06-24T22:02:48Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 97aa7676-a8b9-475b-be19-b355dceff4ec

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR adds per-harness model lists to /inspect capability metadata, propagates registered inspect meta through the agent runtime, exposes it to workflow state, and updates the agent configuration UI and helper logic for model, authentication, and connection selection.

Changes

Agent model picker and inspect metadata

Layer / File(s)	Summary
Harness capability contract `docs/design/agent-workflows/interfaces/public-edge/workflow-inspect.md`, `sdks/python/agenta/sdk/agents/capabilities.py`, `sdks/python/oss/tests/pytest/unit/agents/connections/test_capabilities.py`	The `/inspect` capability contract adds provider-keyed `models` data, and the SDK/test coverage publishes and validates that model catalog.
Inspect meta registration `sdks/python/agenta/sdk/engines/running/utils.py`, `sdks/python/agenta/sdk/decorators/running.py`, `services/oss/src/agent/app.py`, `services/oss/tests/pytest/unit/agent/test_builtin_uri_binding.py`	Workflow meta is stored by URI, merged into `workflow.inspect()`, registered for the builtin agent URI, and checked by builtin-agent inspect tests.
Workflow inspect state `web/packages/agenta-entities/src/workflow/state/inspectMeta.ts`, `web/packages/agenta-entities/src/workflow/state/store.ts`, `web/packages/agenta-entities/src/workflow/state/index.ts`, `web/packages/agenta-entities/src/workflow/index.ts`	Workflow state adds inspect-meta types and atoms, keeps agent workflows on the inspect path, and re-exports the new capability accessors.
Model and connection helpers `web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts`, `web/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts`	ModelRef helpers now compose structured values, derive allowed providers, model groups, selection mode, and named connections from harness capabilities, with unit tests covering those paths.
AgentConfigControl wiring `web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx`, `docs/design/agent-workflows/documentation/agent-configuration.md`	AgentConfigControl reads inspect capabilities and vault secrets to render the unified model picker, falls back to the schema picker when inspect models are absent, and replaces the editor with authentication and connection controls; the field docs are updated to match.
Project design docs `docs/design/agent-workflows/projects/agent-model-picker/...`	The project README, context, research, plan, and status pages describe the model-picker scope, current state, implementation phases, and status.

Sequence Diagram(s)

sequenceDiagram
  participant create_agent_app
  participant register_meta
  participant workflow_inspect as "workflow.inspect()"
  participant retrieve_meta
  participant WorkflowInvokeRequest

  create_agent_app->>register_meta: store {"harness_capabilities": ...} for AGENT_URI
  workflow_inspect->>retrieve_meta: load registered meta for self.uri
  retrieve_meta-->>workflow_inspect: return inspect meta from META_REGISTRY
  workflow_inspect->>WorkflowInvokeRequest: set meta = inspect_meta

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 39.39% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main frontend change: a harness-aware provider and model picker driven by inspect data.
Description check	✅ Passed	The description accurately describes the harness-aware picker, inspect-driven capabilities, and auth/connection changes in the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-model-picker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mmabrouk · 2026-06-24T22:03:08Z

Changes made (scope map)

Phase A — SDK backend (sdks/python/agenta/sdk/agents/capabilities.py)

Added models: Dict[str, List[str]] to HarnessConnectionCapabilities.
Added CLAUDE_MODEL_ALIASES (default/sonnet/opus/haiku + [1m]) and _pi_models() (defensive over supported_llm_models).
Populated HARNESS_CONNECTION_CAPABILITIES[*].models; harness_capabilities_document() emits it via model_dump(). No /run wire change.
Extended oss/tests/pytest/unit/agents/connections/test_capabilities.py.

Phase B — FE plumbing (web/packages/agenta-entities)

New workflow/state/inspectMeta.ts: harnessCapabilitiesAtomFamily(revisionId) derives meta.harness_capabilities from workflowInspectAtomFamily. Exported via workflow/state/index.ts and workflow/index.ts.

Phases C/D + ModelRef fold-in (web/packages/agenta-entity-ui)

connectionUtils.ts: dropped the static FE capability map; helpers now take the inspect-fed HarnessCapabilitiesMap. Added buildModelOptionGroups, providerForModel, harnessAllowsModel, modelSelectionMode, namedConnectionOptions. composeModelValue now ALWAYS returns a ModelRef (bare-string path dropped); reads still tolerate a legacy bare string.
AgentConfigControl.tsx: harness-filtered unified picker (sets provider + model), standalone Provider field removed, harness-switch clears unreachable model, Authentication toggle + vault-fed connection picker. Reads the open revision from useOptionalDrillIn().entityId.
Rewrote tests/unit/connectionUtils.test.ts.

Phase E — docs

interfaces/public-edge/workflow-inspect.md (meta now lists models), documentation/agent-configuration.md (picker + auth UX), projects/agent-model-picker/status.md.

Tests: SDK connections 50 green; entity-ui 126 green (connectionUtils 24); entities/entity-ui/playground typecheck green; ruff + prettier + eslint clean.

Did NOT touch (other lanes / just landed): services/agent/**, workflow/api/api.ts, workflow/state/store.ts, wire*.py, models/workflows.py, utils/types.py CATALOG_TYPES — built on the wire-schema (#4830) and contract-versioning (#4829) work. Base is set to #4830 so the diff is isolated.

Note: I hit the stale-but mark staging-hijack documented on the coordination board (my files auto-staged to feat/agent-capability-fail-loud); recovered cleanly via commit-then-but move to this lane. Both lanes verified clean (this PR = my 15 files only; the capability lane keeps its 6 runner files).

mmabrouk · 2026-06-24T22:03:14Z

@coderabbitai review

coderabbitai · 2026-06-24T22:03:22Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

mmabrouk · 2026-06-24T22:05:25Z

@coderabbitai review

coderabbitai · 2026-06-24T22:05:31Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

github-actions · 2026-06-24T22:29:52Z

Railway Preview Environment


Preview URL	https://gateway-production-4633.up.railway.app/w
Project	`agenta-oss-pr-4839`
Image tag	`pr-4839-a1f7edd`
Status	Deployed
Railway logs	Open logs
Workflow logs	View workflow run
Updated at 2026-06-25T12:23:14.286Z

mmabrouk · 2026-06-25T10:43:54Z

Expected behavior for the agent model picker — please confirm / give feedback

Capturing the target UX for the agent provider + model picker so we agree on the spec. This is what I am building toward; tell me where it is wrong.

Pick the auth mode first: Agenta-managed vs Self-managed. This toggle is the top-level choice.
Then a list of providers, and per provider its models. Same shape as the existing chat/completion playground model picker (grouped provider → models), just with the auth toggle on top.
Self-managed = general (all providers/models available, bring-your-own key). Agenta-managed = scoped to the providers/secrets the user actually has in the vault.
Ability to add a provider/secret inline, mirroring the chat/completion playground's "Add provider" affordance.
The harness constrains which providers/models are reachable (the per-harness capability table). Claude → Anthropic + Claude's models; Pi → its providers and their models.
"Project Default" should NOT be a user-facing connection value. The user selects provider/model (and a connection/secret) explicitly.

What I have fixed so far (the empty list)

The picker was rendering an empty list for every harness. Root cause was two breaks that starved the FE of the per-harness capability data:

Backend: the agent /inspect response carried no meta.harness_capabilities at all. The routed agent workflow sets meta=harness_capabilities, but the playground posts /inspect with a revision, which takes the request-driven inspect_workflow path — it builds a fresh workflow from the request (no meta) and never reads the routed instance's meta. WorkflowRevisionData has no meta field, so the interface registry could not carry it either. Fixed with a META_REGISTRY (mirrors register_interface/retrieve_interface); the agent registers its meta under the builtin URI and workflow.inspect() merges it (request/decorator meta still wins per key).
Frontend: the inspect atom skipped the fetch when the revision already carried all schemas inline (a redundancy optimization). The agent stores its schemas inline, so inspect never ran and the capabilities never reached the FE. Now the FE fetches inspect for the agent regardless (detected by the builtin agent URI).

Verified live on the dev playground: Pi now shows all 8 providers with their models (OpenAI 37, Anthropic 12, Gemini 16, Mistral 14, Groq 8, MiniMax 5, Together 15, OpenRouter 23); switching to Claude scopes the picker to Anthropic only (the 8 Claude aliases) and clears an unreachable model. Items 1, 2, 5 are working.

Open design question (not guessing — want your call)

Item 6 (kill "Project Default") + item 4 (inline add-provider) are a connection-picker UX change on top of the now-working provider/model picker. Today the connection control still offers a "Project default" option plus named vault connections, with the Agenta-managed/Self-managed toggle already present. Reworking it to fully mirror the chat playground's provider+secret selector (and dropping "Project default" as a user-facing value) is a focused follow-up I did not want to guess on. Confirm the exact desired connection UX and I will build it: should "Project default" be removed entirely (forcing an explicit connection pick), or kept as an implicit fallback that is just not shown as a selectable value?

mmabrouk · 2026-06-25T10:43:55Z

Changes in this round — fixed the empty model picker

Verdict: the picker was COMPLETE-but-starved, not incomplete. The harness-filtered picker, the auth toggle, and the connection control were all built (this PR's original commit). The list was empty because the per-harness capability data never reached the FE. Two breaks, both fixed here.

Root cause: empty list

Backend — /inspect emitted no meta on the playground's request path. create_agent_app() sets meta = {harness_capabilities: ...} on ag.workflow(...), but the playground posts /inspect with a revision, which routes to the standalone inspect_workflow(request). That builds a fresh workflow from the request (whose meta is empty) and never reads the routed instance's meta. The interface registry stores a WorkflowRevisionData, which has no meta field, so it could not carry it either. Result: the live /inspect top-level keys were version, revision, configuration — no meta.
Frontend — the inspect atom skipped the fetch for the agent. workflowInspectAtomFamily skips inspect when the revision already carries inputs+outputs+parameters inline (a redundancy optimization). The agent stores all three inline, so inspect never ran, harnessCapabilitiesAtomFamily stayed null, and the picker had nothing → the standard "No data" / "Add provider" empty state.

Fix

SDK (sdks/python/agenta/sdk/)

engines/running/utils.py: new META_REGISTRY + register_meta(meta, uri) / retrieve_meta(uri), mirroring register_interface/retrieve_interface and reusing _get_with_latest. The stored dict is copied so a caller mutation cannot leak into the registry.
decorators/running.py: workflow.inspect() now merges the URI's registered meta into the request meta, with the request/decorator meta winning per key ({**registered, **(self.meta or {})}). Retrieved at the inspect emission boundary, not in __init__, so it does NOT leak into the invoke path's trace metadata.

Service (services/oss/src/agent/app.py)

create_agent_app() calls register_meta(meta, uri=AGENT_URI); the stale comment claiming register_interface carries the meta is corrected.

Frontend (web/packages/agenta-entities/src/workflow/state/store.ts)

The inspect atom now fetches for the agent even when schemas are inline, detected by the builtin agent URI (is_agent is not reliably stored on the revision, so the URI is the robust signal).

Test (services/oss/tests/pytest/unit/agent/test_builtin_uri_binding.py)

New regression test pins the exact broken path: a request-driven inspect of the agent URI normalizes to a response carrying per-harness models (Pi per-provider lists, Claude alias set).

Verified live (dev playground, plain HTTP)

Pi harness → 8 providers each with models (OpenAI 37, Anthropic 12, Gemini 16, Mistral 14, Groq 8, MiniMax 5, Together 15, OpenRouter 23).
Claude harness → Anthropic only (8 Claude aliases); switching harness clears an unreachable model.
Chat streams end to end (see the related sidecar-bind fix on feat(agent): enforce sidecar trust + disable unenforceable sandbox boundaries #4831).

Tests

SDK: 349 unit pass; services agent suite: 47 pass (incl. the new regression test). ruff format + check clean.
FE: @agenta/entities typecheck + 39 unit-test files pass; eslint clean.

Left as an open question (on the expected-behavior comment)

The connection-picker rework (drop user-facing "Project default", inline add-provider to mirror the chat playground) is a focused UX follow-up I did not guess on — see the expected-behavior comment for the exact question.

Codex (xhigh) reviewed the seam choice and the merge semantics; its refinements (retrieve in inspect() not __init__; per-key merge; the regression test) are folded in.

mmabrouk · 2026-06-25T10:44:14Z

@coderabbitai review

coderabbitai · 2026-06-25T10:44:21Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

web/packages/agenta-entities/src/workflow/state/inspectMeta.ts (1)
20-31: 🩺 Stability & Availability | 🔵 Trivial

Align HarnessCapabilities interface with the permissive runtime contract documented.

The connectionUtils.ts consumers defensively handle missing data using optional chaining and explicit checks, preventing runtime errors. However, the HarnessCapabilities interface in inspectMeta.ts incorrectly marks providers, connection_modes, model_selection, and models as required fields, despite documentation confirming older agents may omit them entirely. This type definition contradicts the runtime reality and hides the permissive nature of the data.

Update the interface to mark these fields as optional. This aligns the type system with the actual data shape and prevents reliance on implicit runtime guards.
Diff
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts
 export interface HarnessCapabilities {
     /** Provider families the harness can reach (a literal list; never `"*"`). */
-    providers: string[]
+    providers?: string[]
     /** Deployment surfaces it can consume (`["direct"]` for Pi today). */
     deployments?: string[]
     /** Supported connection modes (`["agenta", "self_managed"]`). */
-    connection_modes: string[]
+    connection_modes?: string[]
     /** How a model is named: `"provider/id"` (Pi) or `"alias"` (Claude). */
-    model_selection: string
+    model_selection?: string
     /** Selectable models per provider family (provider -> list of ids/aliases). */
-    models: Record<string, string[]>
+    models?: Record<string, string[]>
 }
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts (2)
152-187: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Avoid returning the shared ALL_MODES reference from the permissive branch.

allowedConnectionModes returns the module-level ALL_MODES array directly in the permissive path. If any caller mutates the returned array, it silently corrupts the constant for every subsequent call. A defensive copy keeps the helper pure.
♻️ Proposed change
-    if (!entry?.connection_modes?.length) return ALL_MODES
+    if (!entry?.connection_modes?.length) return [...ALL_MODES]
228-256: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

titleizeProvider produces awkward labels for multi-word providers.

together_ai renders as "Together ai" (only the first character is uppercased after the underscore replace). Tests only cover single-word providers, so this slips through. Consider title-casing each word.
♻️ Proposed change
 function titleizeProvider(provider: string): string {
-    return provider.charAt(0).toUpperCase() + provider.slice(1).replace(/_/g, " ")
+    return provider
+        .split("_")
+        .map((w) => w.charAt(0).toUpperCase() + w.slice(1))
+        .join(" ")
 }
docs/design/agent-workflows/projects/agent-model-picker/research.md (1)
45-45: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add a language to the fenced code block.

markdownlint (MD040) flags this fence as missing a language hint. Since it's a plain capability-table listing, text is appropriate.
📝 Proposed change
-```
+```text
 HarnessConnectionCapabilities (:57-73):
Source: Linters/SAST tools

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 32096859-9891-43cf-bb22-e54b7959a604

📥 Commits

Reviewing files that changed from the base of the PR and between 45bb1c2 and 673b178.

📒 Files selected for processing (20)

docs/design/agent-workflows/documentation/agent-configuration.md
docs/design/agent-workflows/interfaces/public-edge/workflow-inspect.md
docs/design/agent-workflows/projects/agent-model-picker/README.md
docs/design/agent-workflows/projects/agent-model-picker/context.md
docs/design/agent-workflows/projects/agent-model-picker/plan.md
docs/design/agent-workflows/projects/agent-model-picker/research.md
docs/design/agent-workflows/projects/agent-model-picker/status.md
sdks/python/agenta/sdk/agents/capabilities.py
sdks/python/agenta/sdk/decorators/running.py
sdks/python/agenta/sdk/engines/running/utils.py
sdks/python/oss/tests/pytest/unit/agents/connections/test_capabilities.py
services/oss/src/agent/app.py
services/oss/tests/pytest/unit/agent/test_builtin_uri_binding.py
web/packages/agenta-entities/src/workflow/index.ts
web/packages/agenta-entities/src/workflow/state/index.ts
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts
web/packages/agenta-entities/src/workflow/state/store.ts
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts
web/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts

coderabbitai · 2026-06-25T10:52:17Z

+def test_pi_models_are_a_subset_of_the_shared_catalog():
+    # Each Pi harness publishes, per vault provider, exactly that provider's catalog ids.
+    for harness in ("pi_core", "pi_agenta"):
+        models = HARNESS_CONNECTION_CAPABILITIES[harness].models
+        # Only the vault-mapped providers are published (no arbitrary catalog providers).
+        assert set(models) <= set(PI_VAULT_PROVIDERS)
+        assert set(models) == set(PI_VAULT_PROVIDERS)
+        for provider, ids in models.items():
+            # The published ids are exactly the shared catalog's ids for that provider
+            # (verbatim — most are provider-prefixed like ``anthropic/...``, but some
+            # providers, e.g. openai, list bare ids like ``gpt-5.5``).
+            assert ids == list(supported_llm_models[provider])


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Test contradicts the defensive intent of _pi_models().

_pi_models() documents that it skips providers missing from supported_llm_models so "a catalog edit never breaks the capability document," but Line 96 asserts set(models) == set(PI_VAULT_PROVIDERS). If a vault provider is ever dropped from the catalog, the silent skip will trip this exact-equality assertion — i.e. the documented resilience is not actually exercised. Decide which behavior is intended: either keep strict equality (then the skip is dead code/misleading comment) or relax to subset to honor the defensive guard.

Also, Line 95 (<=) is fully subsumed by the Line 96 equality and can be dropped.

♻️ Drop the redundant subset assertion

- assert set(models) <= set(PI_VAULT_PROVIDERS) assert set(models) == set(PI_VAULT_PROVIDERS)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_pi_models_are_a_subset_of_the_shared_catalog():

# Each Pi harness publishes, per vault provider, exactly that provider's catalog ids.

for harness in ("pi_core", "pi_agenta"):

models = HARNESS_CONNECTION_CAPABILITIES[harness].models

# Only the vault-mapped providers are published (no arbitrary catalog providers).

assert set(models) <= set(PI_VAULT_PROVIDERS)

assert set(models) == set(PI_VAULT_PROVIDERS)

for provider, ids in models.items():

# The published ids are exactly the shared catalog's ids for that provider

# (verbatim — most are provider-prefixed like ``anthropic/...``, but some

# providers, e.g. openai, list bare ids like ``gpt-5.5``).

assert ids == list(supported_llm_models[provider])

def test_pi_models_are_a_subset_of_the_shared_catalog():

# Each Pi harness publishes, per vault provider, exactly that provider's catalog ids.

for harness in ("pi_core", "pi_agenta"):

models = HARNESS_CONNECTION_CAPABILITIES[harness].models

# Only the vault-mapped providers are published (no arbitrary catalog providers).

assert set(models) == set(PI_VAULT_PROVIDERS)

for provider, ids in models.items():

# The published ids are exactly the shared catalog's ids for that provider

# (verbatim — most are provider-prefixed like ``anthropic/...``, but some

# providers, e.g. openai, list bare ids like ``gpt-5.5``).

assert ids == list(supported_llm_models[provider])

Replace the hand-mirrored protocol.ts <-> wire.py contract (guarded only by golden fixtures) with a single source of truth: dedicated Pydantic wire models, exported JSON Schema, ajv validation in the standalone Node runner. Adds the /run split decision (keep the turn unified; promote GET /capabilities + consume the contract version on both transports), a structured error model { code, message, retryable } with a correctly-modeled cancelled outcome, an in-band contractVersion, and a 9-step test-at-each-step migration (steps 1-6 behavior-preserving at v1; A10 error-model + A3 backend-removal/harness-rename as one v2 cut; gated on the A1 versioning / A3 / A10 siblings). Codex-reviewed. Composio, the tool gateway, connections, and MCP are unchanged. Docs only. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

Address the four inline review comments on PR #4830: - Drop all backward-compat-preservation framing (this is a pre-production POC; any wire shape may change freely). - Pydantic stays the source for now; the exported JSON Schema interface ships in the SDK (the CATALOG_TYPES path), with a Fern investigation: Fern reads Pydantic->OpenAPI->clients, so it can see this interface later via the OpenAPI surface once the contract stabilizes (no hard blocker, only a timing call). Drop using the schema in the sidecar/runner for now. - Remove all runner-side request validation (server.ts/cli.ts); no ingress validation, no ajv, no new runner dependency this phase. - Keep the /capabilities probe (author endorsed it). Also drop the versioning machinery (the pi/agenta rename, already landed, is not versioned; any version field defers to A1's simple string + if/else convention). Keep the structured error model + cancelled outcome and the keep-/run-unified decision. Update status.md to match. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…ct response Implement the schema-driven /run contract plan (this PR's plan doc) plus the folded /inspect follow-ups. Pre-production POC: no back-compat, no runtime validation. - Dedicated Pydantic wire models (sdks/python/agenta/sdk/agents/wire_models.py: WireRunRequest / WireRunResult + an open WireAgentEvent) are the single schema source of truth, separate from the snake_case semantic DTOs. run_contract_schemas() exports their dereferenced camelCase JSON Schema, shipped in the SDK via CATALOG_TYPES (run_request / run_result), the same path agent_config takes. No new endpoint, no new toolchain. - No validation: wire.py stays the dict producer (omit-when-empty lives there, pinned by goldens); the models are the schema authority + test guard only. Nothing gates a live /run. test_wire_models.py: freshness guard (committed catalog == fresh export), goldens validate + parse, request_to_wire output validates, schema props == KNOWN_REQUEST_KEYS. - Issue 1: canonical WorkflowInspectResponse in models/workflows.py; handle_inspect_success normalizes the built WorkflowInvokeRequest into it, lifting WorkflowRevisionData to a flat top-level revision so schemas live at response.revision.schemas (was the latent-broken data.revision.data.schemas). The three /inspect routes return WorkflowInspectResponse. FE InspectWorkflowResponse type + store.ts read now resolve against the real body; the deprecated interface?.schemas fallback stays as a bridge for two sibling readers. - Issue 4: typed /inspect outputs keyed per surface (invoke -> message, messages -> messages) in services/oss/src/agent/schemas.py; reuses existing catalog markers. Deferred: /run version field, runner-side validation, GET /capabilities, generating protocol.ts, structured-error/cancelled outcome, Fern publication. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

…sponse

Phase A (SDK): add a per-provider models map to HarnessConnectionCapabilities and populate HARNESS_CONNECTION_CAPABILITIES (Pi from supported_llm_models; Claude from a new CLAUDE_MODEL_ALIASES constant), emitted on /inspect meta.harness_capabilities. Phases B-D (FE): thread meta.harness_capabilities to the playground via a new harnessCapabilitiesAtomFamily; retire the static FE capability map; build a harness-filtered unified provider+model picker (selecting a model sets both provider and model; standalone provider field removed); add an Agenta-managed vs self-managed authentication toggle + a vault-fed connection picker. The model is ALWAYS a ModelRef. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

The agent playground's model picker showed an empty provider/model list: the agent /inspect response carried no meta.harness_capabilities, so the FE had nothing to render. Two breaks, fixed here. Backend: the routed agent workflow sets meta=harness_capabilities, but the playground posts /inspect WITH a revision, which takes the request-driven inspect_workflow path. That builds a fresh workflow from the request (no meta) and never consults the routed instance's meta, so the capabilities were dropped. WorkflowRevisionData has no meta field, so the interface registry cannot carry it. Add a META_REGISTRY (mirrors register_interface/retrieve_interface); the agent service registers its meta under the builtin URI; workflow.inspect() merges the registered meta (request/decorator meta wins per key). Now /inspect emits meta.harness_capabilities on the request-driven path too. Frontend: the inspect atom skipped the fetch when the revision already carried all schemas inline (a redundancy optimization). The agent stores its schemas inline, so inspect never ran and the capabilities never reached the FE. Fetch inspect for the agent regardless (detected by the builtin agent URI, since the is_agent flag is not reliably stored). Regression test pins the exact broken path: a request-driven inspect of the agent URI normalizes to a response carrying per-harness models. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

mmabrouk · 2026-06-25T11:10:01Z

Code looks good per your review. Two functionality fixes are in progress before merge: (1) the Pi no-response bug you hit, (2) reworking the connection to match the completion/chat pattern (FE sends a reference, not the key; drop 'Project Default') per your decision, asking Codex on the naming. You pre-approved merging once it works, so no further review is needed unless you want to re-test the picker.

…ents Rebuilds the contained agent config-panel branch on top of big-agents after the agent stack landed (#4830 canonical /inspect + models, #4839 model picker, #4840 collapse run-selection into AgentConfig + harness_kwargs). The inspect-driven picker, connectionUtils, inspectMeta and EnumSelectControl we had ported now live in big-agents verbatim, so they collapse out — this branch is only our genuine UX delta on top: - AgentConfigControl as schema-driven accordion sections (ConfigAccordionSection primitive) - per-item drawers on the shared EnhancedDrawer (ConfigItemDrawer) with lazy content - two-pane skill editor, MCP server form, tool form, harness select, markdown/code/JSON editors - Authentication as radio cards with explainers; live JSON-edit sync - chat panel: inline per-turn run errors + truncation, empty-turn collapse - playground wiring (header menu view modes, height calc, router) Coherent with the merged backend: harness/sandbox/permission_policy ride parameters.agent, harness_kwargs bag, pi_core default — taken from big-agents (the earlier Option-A reverts are undone). tsc 0 (entity-ui/agenta-ui/entities/playground; oss 593 baseline, 0 new), eslint clean, unit tests pass (entity-ui 126, playground 122, entities 683).

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 24, 2026

dosubot Bot added enhancement New feature or request Frontend labels Jun 24, 2026

vercel Bot deployed to Preview June 24, 2026 22:03 View deployment

mmabrouk force-pushed the feat/agent-wire-contract-schema-plan branch from 275b2cc to 0d9f0b9 Compare June 25, 2026 08:03

mmabrouk force-pushed the feat/agent-model-picker branch from 0e7f4f7 to e282ea4 Compare June 25, 2026 08:03

vercel Bot deployed to Preview June 25, 2026 08:04 View deployment

mmabrouk force-pushed the feat/agent-model-picker branch from e282ea4 to efd5f70 Compare June 25, 2026 08:12

vercel Bot deployed to Preview June 25, 2026 08:13 View deployment

vercel Bot deployed to Preview June 25, 2026 10:43 View deployment

mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 25, 2026

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

mmabrouk added 6 commits June 25, 2026 12:56

fix(agent): preserve resolved configuration in normalized /inspect re…

443daf8

…sponse

mmabrouk mentioned this pull request Jun 25, 2026

refactor(agent): collapse run-selection into AgentConfig, rename harness_kwargs #4840

Merged

mmabrouk added implementing Design approved; implementation queued/in progress (not awaiting review) and removed needs-review Agent updated; awaiting Mahmoud's review labels Jun 25, 2026

mmabrouk force-pushed the feat/agent-model-picker branch from 673b178 to c406a5d Compare June 25, 2026 11:36

vercel Bot deployed to Preview June 25, 2026 11:37 View deployment

mmabrouk changed the base branch from feat/agent-wire-contract-schema-plan to big-agents June 25, 2026 12:04

mmabrouk merged commit 27cfd7b into big-agents Jun 25, 2026
35 checks passed

coderabbitai Bot mentioned this pull request Jun 28, 2026

feat(frontend): agent config section drawers #4881

Merged

Uh oh!

Conversation

mmabrouk commented Jun 24, 2026

What & why

Before → after

How

Tests

Note for reviewers

Uh oh!

vercel Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

mmabrouk commented Jun 24, 2026

Changes made (scope map)

Uh oh!

mmabrouk commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmabrouk commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

mmabrouk commented Jun 25, 2026

Expected behavior for the agent model picker — please confirm / give feedback

What I have fixed so far (the empty list)

Open design question (not guessing — want your call)

Uh oh!

mmabrouk commented Jun 25, 2026

Changes in this round — fixed the empty model picker

Root cause: empty list

Fix

Verified live (dev playground, plain HTTP)

Tests

Left as an open question (on the expected-behavior comment)

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading