Skip to content

feat(frontend): harness-aware agent provider + model picker (inspect models)#4839

Merged
mmabrouk merged 6 commits into
big-agentsfrom
feat/agent-model-picker
Jun 25, 2026
Merged

feat(frontend): harness-aware agent provider + model picker (inspect models)#4839
mmabrouk merged 6 commits into
big-agentsfrom
feat/agent-model-picker

Conversation

@mmabrouk

Copy link
Copy Markdown
Member

What & why

The agent playground's model picker showed the whole LiteLLM catalog, unfiltered by harness, with a separate redundant Provider field and a free-text connection slug. So an author could pick a model Claude can't reach, or an OpenAI provider for the Claude harness, and only find out when the run failed. The credential UX was a bare mode dropdown plus a free-text box.

This makes the agent picker harness-aware and inspect-driven: the agent's /inspect now publishes the exact models each harness can reach, and the playground renders one unified provider + model picker from it, with a clear authentication choice and a connection picker fed by the vault.

Before → after

  • Model picker: full catalog, unfiltered → harness-filtered (only the models this harness can reach), built straight from /inspect. Selecting a model now sets both the provider and the model id; the redundant standalone Provider field is gone.
  • Capability source: a static hardcoded FE copy of the harness table (with a TODO to consume inspect) → the inspect-published map, threaded through a new atom. Permissive fallback to the full catalog when an agent doesn't publish models (older agents / standalone).
  • Credential UX: a bare "Connection mode" select + a free-text slug → an Authentication toggle (Agenta-managed vs Self-managed) plus a connection picker (Project default / named connections, read from GET /secrets/). Raw-JSON escape hatch kept.
  • config.model: could be a free-text string → always a structured ModelRef ({provider, model, connection?}). The picker is the only way to set it. (Folds in [docs] Add agent workflow interface inventory #4821 comment 3469645457.)

How

  • Phase A — SDK. HarnessConnectionCapabilities gains models: Dict[str, List[str]]; HARNESS_CONNECTION_CAPABILITIES populates it. Pi maps each PI_VAULT_PROVIDERS entry to its supported_llm_models catalog ids (_pi_models() skips a provider missing from the catalog); Claude maps anthropic to a new CLAUDE_MODEL_ALIASES constant (default/sonnet/opus/haiku + [1m] variants). harness_capabilities_document() emits it, so /inspect meta.harness_capabilities carries per-harness models. Not a /run wire change.
  • Phase B — FE plumbing. New harnessCapabilitiesAtomFamily(revisionId) derives meta.harness_capabilities from the existing workflowInspectAtomFamily. connectionUtils.ts retired the static FE capability map; its helpers now take the inspect-fed HarnessCapabilitiesMap and stay permissive when it's absent.
  • Phase C — picker. buildModelOptionGroups builds the grouped options from the harness's published models; SelectLLMProviderBase renders them. providerForModel derives the provider from the picked model's group; harnessAllowsModel clears an unreachable model on harness switch.
  • Phase D — auth. An Authentication toggle maps to connection.mode; namedConnectionOptions lists vault custom-provider connections (the slug the resolver matches on) filtered to the chosen provider + harness. Self-managed sends {mode: "self_managed"} and shows a note.

Tests

  • SDK capability contract test extended (every harness has a models map; Pi ⊆ supported_llm_models; Claude = the alias set; the document round-trips as plain dicts). green (11 tests in the file; 50 in the connections suite).
  • connectionUtils unit tests rewritten for the inspect-fed helpers + the always-ModelRef compose + the picker/connection-option helpers. green (24 tests; 126 across entity-ui).
  • @agenta/entities, @agenta/entity-ui, @agenta/playground typecheck green. ruff + prettier + eslint clean.

Note for reviewers

supported_llm_models["openai"] lists bare ids (gpt-5.5, …) while other providers are prefixed (anthropic/claude-...). The picker uses the catalog id verbatim as the option value and takes the provider from the group key, so the mix is handled; the contract test deliberately does not assert provider-prefixing.

Docs synced in this PR: the /inspect interface inventory entry (now lists models) and documentation/agent-configuration.md (picker + auth UX).

https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc

@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 24, 2026
@vercel

vercel Bot commented Jun 24, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 25, 2026 11:37am

Request Review

@dosubot dosubot Bot added enhancement New feature or request Frontend labels Jun 24, 2026
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 97aa7676-a8b9-475b-be19-b355dceff4ec

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR adds per-harness model lists to /inspect capability metadata, propagates registered inspect meta through the agent runtime, exposes it to workflow state, and updates the agent configuration UI and helper logic for model, authentication, and connection selection.

Changes

Agent model picker and inspect metadata

Layer / File(s) Summary
Harness capability contract
docs/design/agent-workflows/interfaces/public-edge/workflow-inspect.md, sdks/python/agenta/sdk/agents/capabilities.py, sdks/python/oss/tests/pytest/unit/agents/connections/test_capabilities.py
The /inspect capability contract adds provider-keyed models data, and the SDK/test coverage publishes and validates that model catalog.
Inspect meta registration
sdks/python/agenta/sdk/engines/running/utils.py, sdks/python/agenta/sdk/decorators/running.py, services/oss/src/agent/app.py, services/oss/tests/pytest/unit/agent/test_builtin_uri_binding.py
Workflow meta is stored by URI, merged into workflow.inspect(), registered for the builtin agent URI, and checked by builtin-agent inspect tests.
Workflow inspect state
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts, web/packages/agenta-entities/src/workflow/state/store.ts, web/packages/agenta-entities/src/workflow/state/index.ts, web/packages/agenta-entities/src/workflow/index.ts
Workflow state adds inspect-meta types and atoms, keeps agent workflows on the inspect path, and re-exports the new capability accessors.
Model and connection helpers
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts, web/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts
ModelRef helpers now compose structured values, derive allowed providers, model groups, selection mode, and named connections from harness capabilities, with unit tests covering those paths.
AgentConfigControl wiring
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx, docs/design/agent-workflows/documentation/agent-configuration.md
AgentConfigControl reads inspect capabilities and vault secrets to render the unified model picker, falls back to the schema picker when inspect models are absent, and replaces the editor with authentication and connection controls; the field docs are updated to match.
Project design docs
docs/design/agent-workflows/projects/agent-model-picker/...
The project README, context, research, plan, and status pages describe the model-picker scope, current state, implementation phases, and status.

Sequence Diagram(s)

sequenceDiagram
  participant create_agent_app
  participant register_meta
  participant workflow_inspect as "workflow.inspect()"
  participant retrieve_meta
  participant WorkflowInvokeRequest

  create_agent_app->>register_meta: store {"harness_capabilities": ...} for AGENT_URI
  workflow_inspect->>retrieve_meta: load registered meta for self.uri
  retrieve_meta-->>workflow_inspect: return inspect meta from META_REGISTRY
  workflow_inspect->>WorkflowInvokeRequest: set meta = inspect_meta
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.39% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main frontend change: a harness-aware provider and model picker driven by inspect data.
Description check ✅ Passed The description accurately describes the harness-aware picker, inspect-driven capabilities, and auth/connection changes in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-model-picker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mmabrouk

Copy link
Copy Markdown
Member Author

Changes made (scope map)

Phase A — SDK backend (sdks/python/agenta/sdk/agents/capabilities.py)

  • Added models: Dict[str, List[str]] to HarnessConnectionCapabilities.
  • Added CLAUDE_MODEL_ALIASES (default/sonnet/opus/haiku + [1m]) and _pi_models() (defensive over supported_llm_models).
  • Populated HARNESS_CONNECTION_CAPABILITIES[*].models; harness_capabilities_document() emits it via model_dump(). No /run wire change.
  • Extended oss/tests/pytest/unit/agents/connections/test_capabilities.py.

Phase B — FE plumbing (web/packages/agenta-entities)

  • New workflow/state/inspectMeta.ts: harnessCapabilitiesAtomFamily(revisionId) derives meta.harness_capabilities from workflowInspectAtomFamily. Exported via workflow/state/index.ts and workflow/index.ts.

Phases C/D + ModelRef fold-in (web/packages/agenta-entity-ui)

  • connectionUtils.ts: dropped the static FE capability map; helpers now take the inspect-fed HarnessCapabilitiesMap. Added buildModelOptionGroups, providerForModel, harnessAllowsModel, modelSelectionMode, namedConnectionOptions. composeModelValue now ALWAYS returns a ModelRef (bare-string path dropped); reads still tolerate a legacy bare string.
  • AgentConfigControl.tsx: harness-filtered unified picker (sets provider + model), standalone Provider field removed, harness-switch clears unreachable model, Authentication toggle + vault-fed connection picker. Reads the open revision from useOptionalDrillIn().entityId.
  • Rewrote tests/unit/connectionUtils.test.ts.

Phase E — docs

  • interfaces/public-edge/workflow-inspect.md (meta now lists models), documentation/agent-configuration.md (picker + auth UX), projects/agent-model-picker/status.md.

Tests: SDK connections 50 green; entity-ui 126 green (connectionUtils 24); entities/entity-ui/playground typecheck green; ruff + prettier + eslint clean.

Did NOT touch (other lanes / just landed): services/agent/**, workflow/api/api.ts, workflow/state/store.ts, wire*.py, models/workflows.py, utils/types.py CATALOG_TYPES — built on the wire-schema (#4830) and contract-versioning (#4829) work. Base is set to #4830 so the diff is isolated.

Note: I hit the stale-but mark staging-hijack documented on the coordination board (my files auto-staged to feat/agent-capability-fail-loud); recovered cleanly via commit-then-but move to this lane. Both lanes verified clean (this PR = my 15 files only; the capability lane keeps its 6 runner files).

@mmabrouk

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@mmabrouk

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Railway Preview Environment

Preview URL https://gateway-production-4633.up.railway.app/w
Project agenta-oss-pr-4839
Image tag pr-4839-a1f7edd
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-06-25T12:23:14.286Z

@mmabrouk

Copy link
Copy Markdown
Member Author

Expected behavior for the agent model picker — please confirm / give feedback

Capturing the target UX for the agent provider + model picker so we agree on the spec. This is what I am building toward; tell me where it is wrong.

  1. Pick the auth mode first: Agenta-managed vs Self-managed. This toggle is the top-level choice.
  2. Then a list of providers, and per provider its models. Same shape as the existing chat/completion playground model picker (grouped provider → models), just with the auth toggle on top.
  3. Self-managed = general (all providers/models available, bring-your-own key). Agenta-managed = scoped to the providers/secrets the user actually has in the vault.
  4. Ability to add a provider/secret inline, mirroring the chat/completion playground's "Add provider" affordance.
  5. The harness constrains which providers/models are reachable (the per-harness capability table). Claude → Anthropic + Claude's models; Pi → its providers and their models.
  6. "Project Default" should NOT be a user-facing connection value. The user selects provider/model (and a connection/secret) explicitly.

What I have fixed so far (the empty list)

The picker was rendering an empty list for every harness. Root cause was two breaks that starved the FE of the per-harness capability data:

  • Backend: the agent /inspect response carried no meta.harness_capabilities at all. The routed agent workflow sets meta=harness_capabilities, but the playground posts /inspect with a revision, which takes the request-driven inspect_workflow path — it builds a fresh workflow from the request (no meta) and never reads the routed instance's meta. WorkflowRevisionData has no meta field, so the interface registry could not carry it either. Fixed with a META_REGISTRY (mirrors register_interface/retrieve_interface); the agent registers its meta under the builtin URI and workflow.inspect() merges it (request/decorator meta still wins per key).
  • Frontend: the inspect atom skipped the fetch when the revision already carried all schemas inline (a redundancy optimization). The agent stores its schemas inline, so inspect never ran and the capabilities never reached the FE. Now the FE fetches inspect for the agent regardless (detected by the builtin agent URI).

Verified live on the dev playground: Pi now shows all 8 providers with their models (OpenAI 37, Anthropic 12, Gemini 16, Mistral 14, Groq 8, MiniMax 5, Together 15, OpenRouter 23); switching to Claude scopes the picker to Anthropic only (the 8 Claude aliases) and clears an unreachable model. Items 1, 2, 5 are working.

Open design question (not guessing — want your call)

Item 6 (kill "Project Default") + item 4 (inline add-provider) are a connection-picker UX change on top of the now-working provider/model picker. Today the connection control still offers a "Project default" option plus named vault connections, with the Agenta-managed/Self-managed toggle already present. Reworking it to fully mirror the chat playground's provider+secret selector (and dropping "Project default" as a user-facing value) is a focused follow-up I did not want to guess on. Confirm the exact desired connection UX and I will build it: should "Project default" be removed entirely (forcing an explicit connection pick), or kept as an implicit fallback that is just not shown as a selectable value?

@mmabrouk

Copy link
Copy Markdown
Member Author

Changes in this round — fixed the empty model picker

Verdict: the picker was COMPLETE-but-starved, not incomplete. The harness-filtered picker, the auth toggle, and the connection control were all built (this PR's original commit). The list was empty because the per-harness capability data never reached the FE. Two breaks, both fixed here.

Root cause: empty list

  1. Backend — /inspect emitted no meta on the playground's request path. create_agent_app() sets meta = {harness_capabilities: ...} on ag.workflow(...), but the playground posts /inspect with a revision, which routes to the standalone inspect_workflow(request). That builds a fresh workflow from the request (whose meta is empty) and never reads the routed instance's meta. The interface registry stores a WorkflowRevisionData, which has no meta field, so it could not carry it either. Result: the live /inspect top-level keys were version, revision, configuration — no meta.

  2. Frontend — the inspect atom skipped the fetch for the agent. workflowInspectAtomFamily skips inspect when the revision already carries inputs+outputs+parameters inline (a redundancy optimization). The agent stores all three inline, so inspect never ran, harnessCapabilitiesAtomFamily stayed null, and the picker had nothing → the standard "No data" / "Add provider" empty state.

Fix

SDK (sdks/python/agenta/sdk/)

  • engines/running/utils.py: new META_REGISTRY + register_meta(meta, uri) / retrieve_meta(uri), mirroring register_interface/retrieve_interface and reusing _get_with_latest. The stored dict is copied so a caller mutation cannot leak into the registry.
  • decorators/running.py: workflow.inspect() now merges the URI's registered meta into the request meta, with the request/decorator meta winning per key ({**registered, **(self.meta or {})}). Retrieved at the inspect emission boundary, not in __init__, so it does NOT leak into the invoke path's trace metadata.

Service (services/oss/src/agent/app.py)

  • create_agent_app() calls register_meta(meta, uri=AGENT_URI); the stale comment claiming register_interface carries the meta is corrected.

Frontend (web/packages/agenta-entities/src/workflow/state/store.ts)

  • The inspect atom now fetches for the agent even when schemas are inline, detected by the builtin agent URI (is_agent is not reliably stored on the revision, so the URI is the robust signal).

Test (services/oss/tests/pytest/unit/agent/test_builtin_uri_binding.py)

  • New regression test pins the exact broken path: a request-driven inspect of the agent URI normalizes to a response carrying per-harness models (Pi per-provider lists, Claude alias set).

Verified live (dev playground, plain HTTP)

Tests

  • SDK: 349 unit pass; services agent suite: 47 pass (incl. the new regression test). ruff format + check clean.
  • FE: @agenta/entities typecheck + 39 unit-test files pass; eslint clean.

Left as an open question (on the expected-behavior comment)

The connection-picker rework (drop user-facing "Project default", inline add-provider to mirror the chat playground) is a focused UX follow-up I did not guess on — see the expected-behavior comment for the exact question.

Codex (xhigh) reviewed the seam choice and the merge semantics; its refinements (retrieve in inspect() not __init__; per-key merge; the regression test) are folded in.

@mmabrouk

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@mmabrouk mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jun 25, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts (1)

20-31: 🩺 Stability & Availability | 🔵 Trivial

Align HarnessCapabilities interface with the permissive runtime contract documented.

The connectionUtils.ts consumers defensively handle missing data using optional chaining and explicit checks, preventing runtime errors. However, the HarnessCapabilities interface in inspectMeta.ts incorrectly marks providers, connection_modes, model_selection, and models as required fields, despite documentation confirming older agents may omit them entirely. This type definition contradicts the runtime reality and hides the permissive nature of the data.

Update the interface to mark these fields as optional. This aligns the type system with the actual data shape and prevents reliance on implicit runtime guards.

Diff
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts
 export interface HarnessCapabilities {
     /** Provider families the harness can reach (a literal list; never `"*"`). */
-    providers: string[]
+    providers?: string[]
     /** Deployment surfaces it can consume (`["direct"]` for Pi today). */
     deployments?: string[]
     /** Supported connection modes (`["agenta", "self_managed"]`). */
-    connection_modes: string[]
+    connection_modes?: string[]
     /** How a model is named: `"provider/id"` (Pi) or `"alias"` (Claude). */
-    model_selection: string
+    model_selection?: string
     /** Selectable models per provider family (provider -> list of ids/aliases). */
-    models: Record<string, string[]>
+    models?: Record<string, string[]>
 }
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts (2)

152-187: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Avoid returning the shared ALL_MODES reference from the permissive branch.

allowedConnectionModes returns the module-level ALL_MODES array directly in the permissive path. If any caller mutates the returned array, it silently corrupts the constant for every subsequent call. A defensive copy keeps the helper pure.

♻️ Proposed change
-    if (!entry?.connection_modes?.length) return ALL_MODES
+    if (!entry?.connection_modes?.length) return [...ALL_MODES]

228-256: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

titleizeProvider produces awkward labels for multi-word providers.

together_ai renders as "Together ai" (only the first character is uppercased after the underscore replace). Tests only cover single-word providers, so this slips through. Consider title-casing each word.

♻️ Proposed change
 function titleizeProvider(provider: string): string {
-    return provider.charAt(0).toUpperCase() + provider.slice(1).replace(/_/g, " ")
+    return provider
+        .split("_")
+        .map((w) => w.charAt(0).toUpperCase() + w.slice(1))
+        .join(" ")
 }
docs/design/agent-workflows/projects/agent-model-picker/research.md (1)

45-45: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add a language to the fenced code block.

markdownlint (MD040) flags this fence as missing a language hint. Since it's a plain capability-table listing, text is appropriate.

📝 Proposed change
-```
+```text
 HarnessConnectionCapabilities (:57-73):

Source: Linters/SAST tools


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 32096859-9891-43cf-bb22-e54b7959a604

📥 Commits

Reviewing files that changed from the base of the PR and between 45bb1c2 and 673b178.

📒 Files selected for processing (20)
  • docs/design/agent-workflows/documentation/agent-configuration.md
  • docs/design/agent-workflows/interfaces/public-edge/workflow-inspect.md
  • docs/design/agent-workflows/projects/agent-model-picker/README.md
  • docs/design/agent-workflows/projects/agent-model-picker/context.md
  • docs/design/agent-workflows/projects/agent-model-picker/plan.md
  • docs/design/agent-workflows/projects/agent-model-picker/research.md
  • docs/design/agent-workflows/projects/agent-model-picker/status.md
  • sdks/python/agenta/sdk/agents/capabilities.py
  • sdks/python/agenta/sdk/decorators/running.py
  • sdks/python/agenta/sdk/engines/running/utils.py
  • sdks/python/oss/tests/pytest/unit/agents/connections/test_capabilities.py
  • services/oss/src/agent/app.py
  • services/oss/tests/pytest/unit/agent/test_builtin_uri_binding.py
  • web/packages/agenta-entities/src/workflow/index.ts
  • web/packages/agenta-entities/src/workflow/state/index.ts
  • web/packages/agenta-entities/src/workflow/state/inspectMeta.ts
  • web/packages/agenta-entities/src/workflow/state/store.ts
  • web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx
  • web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts
  • web/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts

Comment on lines +90 to +101
def test_pi_models_are_a_subset_of_the_shared_catalog():
# Each Pi harness publishes, per vault provider, exactly that provider's catalog ids.
for harness in ("pi_core", "pi_agenta"):
models = HARNESS_CONNECTION_CAPABILITIES[harness].models
# Only the vault-mapped providers are published (no arbitrary catalog providers).
assert set(models) <= set(PI_VAULT_PROVIDERS)
assert set(models) == set(PI_VAULT_PROVIDERS)
for provider, ids in models.items():
# The published ids are exactly the shared catalog's ids for that provider
# (verbatim — most are provider-prefixed like ``anthropic/...``, but some
# providers, e.g. openai, list bare ids like ``gpt-5.5``).
assert ids == list(supported_llm_models[provider])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Test contradicts the defensive intent of _pi_models().

_pi_models() documents that it skips providers missing from supported_llm_models so "a catalog edit never breaks the capability document," but Line 96 asserts set(models) == set(PI_VAULT_PROVIDERS). If a vault provider is ever dropped from the catalog, the silent skip will trip this exact-equality assertion — i.e. the documented resilience is not actually exercised. Decide which behavior is intended: either keep strict equality (then the skip is dead code/misleading comment) or relax to subset to honor the defensive guard.

Also, Line 95 (<=) is fully subsumed by the Line 96 equality and can be dropped.

♻️ Drop the redundant subset assertion
-        assert set(models) <= set(PI_VAULT_PROVIDERS)
         assert set(models) == set(PI_VAULT_PROVIDERS)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_pi_models_are_a_subset_of_the_shared_catalog():
# Each Pi harness publishes, per vault provider, exactly that provider's catalog ids.
for harness in ("pi_core", "pi_agenta"):
models = HARNESS_CONNECTION_CAPABILITIES[harness].models
# Only the vault-mapped providers are published (no arbitrary catalog providers).
assert set(models) <= set(PI_VAULT_PROVIDERS)
assert set(models) == set(PI_VAULT_PROVIDERS)
for provider, ids in models.items():
# The published ids are exactly the shared catalog's ids for that provider
# (verbatim — most are provider-prefixed like ``anthropic/...``, but some
# providers, e.g. openai, list bare ids like ``gpt-5.5``).
assert ids == list(supported_llm_models[provider])
def test_pi_models_are_a_subset_of_the_shared_catalog():
# Each Pi harness publishes, per vault provider, exactly that provider's catalog ids.
for harness in ("pi_core", "pi_agenta"):
models = HARNESS_CONNECTION_CAPABILITIES[harness].models
# Only the vault-mapped providers are published (no arbitrary catalog providers).
assert set(models) == set(PI_VAULT_PROVIDERS)
for provider, ids in models.items():
# The published ids are exactly the shared catalog's ids for that provider
# (verbatim — most are provider-prefixed like ``anthropic/...``, but some
# providers, e.g. openai, list bare ids like ``gpt-5.5``).
assert ids == list(supported_llm_models[provider])

mmabrouk added 6 commits June 25, 2026 12:56
Replace the hand-mirrored protocol.ts <-> wire.py contract (guarded only by
golden fixtures) with a single source of truth: dedicated Pydantic wire models,
exported JSON Schema, ajv validation in the standalone Node runner. Adds the
/run split decision (keep the turn unified; promote GET /capabilities + consume
the contract version on both transports), a structured error model
{ code, message, retryable } with a correctly-modeled cancelled outcome, an
in-band contractVersion, and a 9-step test-at-each-step migration (steps 1-6
behavior-preserving at v1; A10 error-model + A3 backend-removal/harness-rename
as one v2 cut; gated on the A1 versioning / A3 / A10 siblings). Codex-reviewed.
Composio, the tool gateway, connections, and MCP are unchanged. Docs only.

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
Address the four inline review comments on PR #4830:
- Drop all backward-compat-preservation framing (this is a pre-production
  POC; any wire shape may change freely).
- Pydantic stays the source for now; the exported JSON Schema interface
  ships in the SDK (the CATALOG_TYPES path), with a Fern investigation:
  Fern reads Pydantic->OpenAPI->clients, so it can see this interface
  later via the OpenAPI surface once the contract stabilizes (no hard
  blocker, only a timing call). Drop using the schema in the
  sidecar/runner for now.
- Remove all runner-side request validation (server.ts/cli.ts); no
  ingress validation, no ajv, no new runner dependency this phase.
- Keep the /capabilities probe (author endorsed it).

Also drop the versioning machinery (the pi/agenta rename, already landed,
is not versioned; any version field defers to A1's simple string + if/else
convention). Keep the structured error model + cancelled outcome and the
keep-/run-unified decision. Update status.md to match.

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
…ct response

Implement the schema-driven /run contract plan (this PR's plan doc) plus the folded
/inspect follow-ups. Pre-production POC: no back-compat, no runtime validation.

- Dedicated Pydantic wire models (sdks/python/agenta/sdk/agents/wire_models.py:
  WireRunRequest / WireRunResult + an open WireAgentEvent) are the single schema source
  of truth, separate from the snake_case semantic DTOs. run_contract_schemas() exports
  their dereferenced camelCase JSON Schema, shipped in the SDK via CATALOG_TYPES
  (run_request / run_result), the same path agent_config takes. No new endpoint, no new
  toolchain.
- No validation: wire.py stays the dict producer (omit-when-empty lives there, pinned by
  goldens); the models are the schema authority + test guard only. Nothing gates a live
  /run. test_wire_models.py: freshness guard (committed catalog == fresh export), goldens
  validate + parse, request_to_wire output validates, schema props == KNOWN_REQUEST_KEYS.
- Issue 1: canonical WorkflowInspectResponse in models/workflows.py; handle_inspect_success
  normalizes the built WorkflowInvokeRequest into it, lifting WorkflowRevisionData to a flat
  top-level revision so schemas live at response.revision.schemas (was the latent-broken
  data.revision.data.schemas). The three /inspect routes return WorkflowInspectResponse. FE
  InspectWorkflowResponse type + store.ts read now resolve against the real body; the
  deprecated interface?.schemas fallback stays as a bridge for two sibling readers.
- Issue 4: typed /inspect outputs keyed per surface (invoke -> message, messages ->
  messages) in services/oss/src/agent/schemas.py; reuses existing catalog markers.

Deferred: /run version field, runner-side validation, GET /capabilities, generating
protocol.ts, structured-error/cancelled outcome, Fern publication.

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
Phase A (SDK): add a per-provider models map to HarnessConnectionCapabilities and
populate HARNESS_CONNECTION_CAPABILITIES (Pi from supported_llm_models; Claude from a
new CLAUDE_MODEL_ALIASES constant), emitted on /inspect meta.harness_capabilities.

Phases B-D (FE): thread meta.harness_capabilities to the playground via a new
harnessCapabilitiesAtomFamily; retire the static FE capability map; build a
harness-filtered unified provider+model picker (selecting a model sets both provider
and model; standalone provider field removed); add an Agenta-managed vs self-managed
authentication toggle + a vault-fed connection picker. The model is ALWAYS a ModelRef.

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
The agent playground's model picker showed an empty provider/model list: the
agent /inspect response carried no meta.harness_capabilities, so the FE had
nothing to render. Two breaks, fixed here.

Backend: the routed agent workflow sets meta=harness_capabilities, but the
playground posts /inspect WITH a revision, which takes the request-driven
inspect_workflow path. That builds a fresh workflow from the request (no meta)
and never consults the routed instance's meta, so the capabilities were dropped.
WorkflowRevisionData has no meta field, so the interface registry cannot carry
it. Add a META_REGISTRY (mirrors register_interface/retrieve_interface); the
agent service registers its meta under the builtin URI; workflow.inspect()
merges the registered meta (request/decorator meta wins per key). Now /inspect
emits meta.harness_capabilities on the request-driven path too.

Frontend: the inspect atom skipped the fetch when the revision already carried
all schemas inline (a redundancy optimization). The agent stores its schemas
inline, so inspect never ran and the capabilities never reached the FE. Fetch
inspect for the agent regardless (detected by the builtin agent URI, since the
is_agent flag is not reliably stored).

Regression test pins the exact broken path: a request-driven inspect of the
agent URI normalizes to a response carrying per-harness models.

Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
@mmabrouk mmabrouk added implementing Design approved; implementation queued/in progress (not awaiting review) and removed needs-review Agent updated; awaiting Mahmoud's review labels Jun 25, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

Code looks good per your review. Two functionality fixes are in progress before merge: (1) the Pi no-response bug you hit, (2) reworking the connection to match the completion/chat pattern (FE sends a reference, not the key; drop 'Project Default') per your decision, asking Codex on the naming. You pre-approved merging once it works, so no further review is needed unless you want to re-test the picker.

@mmabrouk mmabrouk force-pushed the feat/agent-model-picker branch from 673b178 to c406a5d Compare June 25, 2026 11:36
@mmabrouk mmabrouk changed the base branch from feat/agent-wire-contract-schema-plan to big-agents June 25, 2026 12:04
@mmabrouk mmabrouk merged commit 27cfd7b into big-agents Jun 25, 2026
35 checks passed
ardaerzin added a commit that referenced this pull request Jun 25, 2026
…ents

Rebuilds the contained agent config-panel branch on top of big-agents after the agent stack landed
(#4830 canonical /inspect + models, #4839 model picker, #4840 collapse run-selection into AgentConfig
+ harness_kwargs). The inspect-driven picker, connectionUtils, inspectMeta and EnumSelectControl we
had ported now live in big-agents verbatim, so they collapse out — this branch is only our genuine UX
delta on top:

- AgentConfigControl as schema-driven accordion sections (ConfigAccordionSection primitive)
- per-item drawers on the shared EnhancedDrawer (ConfigItemDrawer) with lazy content
- two-pane skill editor, MCP server form, tool form, harness select, markdown/code/JSON editors
- Authentication as radio cards with explainers; live JSON-edit sync
- chat panel: inline per-turn run errors + truncation, empty-turn collapse
- playground wiring (header menu view modes, height calc, router)

Coherent with the merged backend: harness/sandbox/permission_policy ride parameters.agent,
harness_kwargs bag, pi_core default — taken from big-agents (the earlier Option-A reverts are undone).

tsc 0 (entity-ui/agenta-ui/entities/playground; oss 593 baseline, 0 new), eslint clean,
unit tests pass (entity-ui 126, playground 122, entities 683).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Frontend implementing Design approved; implementation queued/in progress (not awaiting review) size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant