feat(frontend): harness-aware agent provider + model picker (inspect models)#4839
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR adds per-harness model lists to ChangesAgent model picker and inspect metadata
Sequence Diagram(s)sequenceDiagram
participant create_agent_app
participant register_meta
participant workflow_inspect as "workflow.inspect()"
participant retrieve_meta
participant WorkflowInvokeRequest
create_agent_app->>register_meta: store {"harness_capabilities": ...} for AGENT_URI
workflow_inspect->>retrieve_meta: load registered meta for self.uri
retrieve_meta-->>workflow_inspect: return inspect meta from META_REGISTRY
workflow_inspect->>WorkflowInvokeRequest: set meta = inspect_meta
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Changes made (scope map)Phase A — SDK backend (
Phase B — FE plumbing (
Phases C/D + ModelRef fold-in (
Phase E — docs
Tests: SDK connections 50 green; entity-ui 126 green (connectionUtils 24); entities/entity-ui/playground typecheck green; ruff + prettier + eslint clean. Did NOT touch (other lanes / just landed):
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
Railway Preview Environment
|
275b2cc to
0d9f0b9
Compare
0e7f4f7 to
e282ea4
Compare
e282ea4 to
efd5f70
Compare
Expected behavior for the agent model picker — please confirm / give feedbackCapturing the target UX for the agent provider + model picker so we agree on the spec. This is what I am building toward; tell me where it is wrong.
What I have fixed so far (the empty list)The picker was rendering an empty list for every harness. Root cause was two breaks that starved the FE of the per-harness capability data:
Verified live on the dev playground: Pi now shows all 8 providers with their models (OpenAI 37, Anthropic 12, Gemini 16, Mistral 14, Groq 8, MiniMax 5, Together 15, OpenRouter 23); switching to Claude scopes the picker to Anthropic only (the 8 Claude aliases) and clears an unreachable model. Items 1, 2, 5 are working. Open design question (not guessing — want your call)Item 6 (kill "Project Default") + item 4 (inline add-provider) are a connection-picker UX change on top of the now-working provider/model picker. Today the connection control still offers a "Project default" option plus named vault connections, with the Agenta-managed/Self-managed toggle already present. Reworking it to fully mirror the chat playground's provider+secret selector (and dropping "Project default" as a user-facing value) is a focused follow-up I did not want to guess on. Confirm the exact desired connection UX and I will build it: should "Project default" be removed entirely (forcing an explicit connection pick), or kept as an implicit fallback that is just not shown as a selectable value? |
Changes in this round — fixed the empty model pickerVerdict: the picker was COMPLETE-but-starved, not incomplete. The harness-filtered picker, the auth toggle, and the connection control were all built (this PR's original commit). The list was empty because the per-harness capability data never reached the FE. Two breaks, both fixed here. Root cause: empty list
FixSDK (
Service (
Frontend (
Test (
Verified live (dev playground, plain HTTP)
Tests
Left as an open question (on the expected-behavior comment)The connection-picker rework (drop user-facing "Project default", inline add-provider to mirror the chat playground) is a focused UX follow-up I did not guess on — see the expected-behavior comment for the exact question. Codex (xhigh) reviewed the seam choice and the merge semantics; its refinements (retrieve in |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts (1)
20-31: 🩺 Stability & Availability | 🔵 TrivialAlign
HarnessCapabilitiesinterface with the permissive runtime contract documented.The
connectionUtils.tsconsumers defensively handle missing data using optional chaining and explicit checks, preventing runtime errors. However, theHarnessCapabilitiesinterface ininspectMeta.tsincorrectly marksproviders,connection_modes,model_selection, andmodelsas required fields, despite documentation confirming older agents may omit them entirely. This type definition contradicts the runtime reality and hides the permissive nature of the data.Update the interface to mark these fields as optional. This aligns the type system with the actual data shape and prevents reliance on implicit runtime guards.
Diff
web/packages/agenta-entities/src/workflow/state/inspectMeta.ts export interface HarnessCapabilities { /** Provider families the harness can reach (a literal list; never `"*"`). */ - providers: string[] + providers?: string[] /** Deployment surfaces it can consume (`["direct"]` for Pi today). */ deployments?: string[] /** Supported connection modes (`["agenta", "self_managed"]`). */ - connection_modes: string[] + connection_modes?: string[] /** How a model is named: `"provider/id"` (Pi) or `"alias"` (Claude). */ - model_selection: string + model_selection?: string /** Selectable models per provider family (provider -> list of ids/aliases). */ - models: Record<string, string[]> + models?: Record<string, string[]> }web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts (2)
152-187: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAvoid returning the shared
ALL_MODESreference from the permissive branch.
allowedConnectionModesreturns the module-levelALL_MODESarray directly in the permissive path. If any caller mutates the returned array, it silently corrupts the constant for every subsequent call. A defensive copy keeps the helper pure.♻️ Proposed change
- if (!entry?.connection_modes?.length) return ALL_MODES + if (!entry?.connection_modes?.length) return [...ALL_MODES]
228-256: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
titleizeProviderproduces awkward labels for multi-word providers.
together_airenders as "Together ai" (only the first character is uppercased after the underscore replace). Tests only cover single-word providers, so this slips through. Consider title-casing each word.♻️ Proposed change
function titleizeProvider(provider: string): string { - return provider.charAt(0).toUpperCase() + provider.slice(1).replace(/_/g, " ") + return provider + .split("_") + .map((w) => w.charAt(0).toUpperCase() + w.slice(1)) + .join(" ") }docs/design/agent-workflows/projects/agent-model-picker/research.md (1)
45-45: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd a language to the fenced code block.
markdownlint (MD040) flags this fence as missing a language hint. Since it's a plain capability-table listing,
textis appropriate.📝 Proposed change
-``` +```text HarnessConnectionCapabilities (:57-73):Source: Linters/SAST tools
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 32096859-9891-43cf-bb22-e54b7959a604
📒 Files selected for processing (20)
docs/design/agent-workflows/documentation/agent-configuration.mddocs/design/agent-workflows/interfaces/public-edge/workflow-inspect.mddocs/design/agent-workflows/projects/agent-model-picker/README.mddocs/design/agent-workflows/projects/agent-model-picker/context.mddocs/design/agent-workflows/projects/agent-model-picker/plan.mddocs/design/agent-workflows/projects/agent-model-picker/research.mddocs/design/agent-workflows/projects/agent-model-picker/status.mdsdks/python/agenta/sdk/agents/capabilities.pysdks/python/agenta/sdk/decorators/running.pysdks/python/agenta/sdk/engines/running/utils.pysdks/python/oss/tests/pytest/unit/agents/connections/test_capabilities.pyservices/oss/src/agent/app.pyservices/oss/tests/pytest/unit/agent/test_builtin_uri_binding.pyweb/packages/agenta-entities/src/workflow/index.tsweb/packages/agenta-entities/src/workflow/state/index.tsweb/packages/agenta-entities/src/workflow/state/inspectMeta.tsweb/packages/agenta-entities/src/workflow/state/store.tsweb/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsxweb/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.tsweb/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts
| def test_pi_models_are_a_subset_of_the_shared_catalog(): | ||
| # Each Pi harness publishes, per vault provider, exactly that provider's catalog ids. | ||
| for harness in ("pi_core", "pi_agenta"): | ||
| models = HARNESS_CONNECTION_CAPABILITIES[harness].models | ||
| # Only the vault-mapped providers are published (no arbitrary catalog providers). | ||
| assert set(models) <= set(PI_VAULT_PROVIDERS) | ||
| assert set(models) == set(PI_VAULT_PROVIDERS) | ||
| for provider, ids in models.items(): | ||
| # The published ids are exactly the shared catalog's ids for that provider | ||
| # (verbatim — most are provider-prefixed like ``anthropic/...``, but some | ||
| # providers, e.g. openai, list bare ids like ``gpt-5.5``). | ||
| assert ids == list(supported_llm_models[provider]) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Test contradicts the defensive intent of _pi_models().
_pi_models() documents that it skips providers missing from supported_llm_models so "a catalog edit never breaks the capability document," but Line 96 asserts set(models) == set(PI_VAULT_PROVIDERS). If a vault provider is ever dropped from the catalog, the silent skip will trip this exact-equality assertion — i.e. the documented resilience is not actually exercised. Decide which behavior is intended: either keep strict equality (then the skip is dead code/misleading comment) or relax to subset to honor the defensive guard.
Also, Line 95 (<=) is fully subsumed by the Line 96 equality and can be dropped.
♻️ Drop the redundant subset assertion
- assert set(models) <= set(PI_VAULT_PROVIDERS)
assert set(models) == set(PI_VAULT_PROVIDERS)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_pi_models_are_a_subset_of_the_shared_catalog(): | |
| # Each Pi harness publishes, per vault provider, exactly that provider's catalog ids. | |
| for harness in ("pi_core", "pi_agenta"): | |
| models = HARNESS_CONNECTION_CAPABILITIES[harness].models | |
| # Only the vault-mapped providers are published (no arbitrary catalog providers). | |
| assert set(models) <= set(PI_VAULT_PROVIDERS) | |
| assert set(models) == set(PI_VAULT_PROVIDERS) | |
| for provider, ids in models.items(): | |
| # The published ids are exactly the shared catalog's ids for that provider | |
| # (verbatim — most are provider-prefixed like ``anthropic/...``, but some | |
| # providers, e.g. openai, list bare ids like ``gpt-5.5``). | |
| assert ids == list(supported_llm_models[provider]) | |
| def test_pi_models_are_a_subset_of_the_shared_catalog(): | |
| # Each Pi harness publishes, per vault provider, exactly that provider's catalog ids. | |
| for harness in ("pi_core", "pi_agenta"): | |
| models = HARNESS_CONNECTION_CAPABILITIES[harness].models | |
| # Only the vault-mapped providers are published (no arbitrary catalog providers). | |
| assert set(models) == set(PI_VAULT_PROVIDERS) | |
| for provider, ids in models.items(): | |
| # The published ids are exactly the shared catalog's ids for that provider | |
| # (verbatim — most are provider-prefixed like ``anthropic/...``, but some | |
| # providers, e.g. openai, list bare ids like ``gpt-5.5``). | |
| assert ids == list(supported_llm_models[provider]) |
Replace the hand-mirrored protocol.ts <-> wire.py contract (guarded only by
golden fixtures) with a single source of truth: dedicated Pydantic wire models,
exported JSON Schema, ajv validation in the standalone Node runner. Adds the
/run split decision (keep the turn unified; promote GET /capabilities + consume
the contract version on both transports), a structured error model
{ code, message, retryable } with a correctly-modeled cancelled outcome, an
in-band contractVersion, and a 9-step test-at-each-step migration (steps 1-6
behavior-preserving at v1; A10 error-model + A3 backend-removal/harness-rename
as one v2 cut; gated on the A1 versioning / A3 / A10 siblings). Codex-reviewed.
Composio, the tool gateway, connections, and MCP are unchanged. Docs only.
Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
Address the four inline review comments on PR #4830: - Drop all backward-compat-preservation framing (this is a pre-production POC; any wire shape may change freely). - Pydantic stays the source for now; the exported JSON Schema interface ships in the SDK (the CATALOG_TYPES path), with a Fern investigation: Fern reads Pydantic->OpenAPI->clients, so it can see this interface later via the OpenAPI surface once the contract stabilizes (no hard blocker, only a timing call). Drop using the schema in the sidecar/runner for now. - Remove all runner-side request validation (server.ts/cli.ts); no ingress validation, no ajv, no new runner dependency this phase. - Keep the /capabilities probe (author endorsed it). Also drop the versioning machinery (the pi/agenta rename, already landed, is not versioned; any version field defers to A1's simple string + if/else convention). Keep the structured error model + cancelled outcome and the keep-/run-unified decision. Update status.md to match. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
…ct response Implement the schema-driven /run contract plan (this PR's plan doc) plus the folded /inspect follow-ups. Pre-production POC: no back-compat, no runtime validation. - Dedicated Pydantic wire models (sdks/python/agenta/sdk/agents/wire_models.py: WireRunRequest / WireRunResult + an open WireAgentEvent) are the single schema source of truth, separate from the snake_case semantic DTOs. run_contract_schemas() exports their dereferenced camelCase JSON Schema, shipped in the SDK via CATALOG_TYPES (run_request / run_result), the same path agent_config takes. No new endpoint, no new toolchain. - No validation: wire.py stays the dict producer (omit-when-empty lives there, pinned by goldens); the models are the schema authority + test guard only. Nothing gates a live /run. test_wire_models.py: freshness guard (committed catalog == fresh export), goldens validate + parse, request_to_wire output validates, schema props == KNOWN_REQUEST_KEYS. - Issue 1: canonical WorkflowInspectResponse in models/workflows.py; handle_inspect_success normalizes the built WorkflowInvokeRequest into it, lifting WorkflowRevisionData to a flat top-level revision so schemas live at response.revision.schemas (was the latent-broken data.revision.data.schemas). The three /inspect routes return WorkflowInspectResponse. FE InspectWorkflowResponse type + store.ts read now resolve against the real body; the deprecated interface?.schemas fallback stays as a bridge for two sibling readers. - Issue 4: typed /inspect outputs keyed per surface (invoke -> message, messages -> messages) in services/oss/src/agent/schemas.py; reuses existing catalog markers. Deferred: /run version field, runner-side validation, GET /capabilities, generating protocol.ts, structured-error/cancelled outcome, Fern publication. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
Phase A (SDK): add a per-provider models map to HarnessConnectionCapabilities and populate HARNESS_CONNECTION_CAPABILITIES (Pi from supported_llm_models; Claude from a new CLAUDE_MODEL_ALIASES constant), emitted on /inspect meta.harness_capabilities. Phases B-D (FE): thread meta.harness_capabilities to the playground via a new harnessCapabilitiesAtomFamily; retire the static FE capability map; build a harness-filtered unified provider+model picker (selecting a model sets both provider and model; standalone provider field removed); add an Agenta-managed vs self-managed authentication toggle + a vault-fed connection picker. The model is ALWAYS a ModelRef. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
The agent playground's model picker showed an empty provider/model list: the agent /inspect response carried no meta.harness_capabilities, so the FE had nothing to render. Two breaks, fixed here. Backend: the routed agent workflow sets meta=harness_capabilities, but the playground posts /inspect WITH a revision, which takes the request-driven inspect_workflow path. That builds a fresh workflow from the request (no meta) and never consults the routed instance's meta, so the capabilities were dropped. WorkflowRevisionData has no meta field, so the interface registry cannot carry it. Add a META_REGISTRY (mirrors register_interface/retrieve_interface); the agent service registers its meta under the builtin URI; workflow.inspect() merges the registered meta (request/decorator meta wins per key). Now /inspect emits meta.harness_capabilities on the request-driven path too. Frontend: the inspect atom skipped the fetch when the revision already carried all schemas inline (a redundancy optimization). The agent stores its schemas inline, so inspect never ran and the capabilities never reached the FE. Fetch inspect for the agent regardless (detected by the builtin agent URI, since the is_agent flag is not reliably stored). Regression test pins the exact broken path: a request-driven inspect of the agent URI normalizes to a response carrying per-harness models. Claude-Session: https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc
|
Code looks good per your review. Two functionality fixes are in progress before merge: (1) the Pi no-response bug you hit, (2) reworking the connection to match the completion/chat pattern (FE sends a reference, not the key; drop 'Project Default') per your decision, asking Codex on the naming. You pre-approved merging once it works, so no further review is needed unless you want to re-test the picker. |
673b178 to
c406a5d
Compare
…ents Rebuilds the contained agent config-panel branch on top of big-agents after the agent stack landed (#4830 canonical /inspect + models, #4839 model picker, #4840 collapse run-selection into AgentConfig + harness_kwargs). The inspect-driven picker, connectionUtils, inspectMeta and EnumSelectControl we had ported now live in big-agents verbatim, so they collapse out — this branch is only our genuine UX delta on top: - AgentConfigControl as schema-driven accordion sections (ConfigAccordionSection primitive) - per-item drawers on the shared EnhancedDrawer (ConfigItemDrawer) with lazy content - two-pane skill editor, MCP server form, tool form, harness select, markdown/code/JSON editors - Authentication as radio cards with explainers; live JSON-edit sync - chat panel: inline per-turn run errors + truncation, empty-turn collapse - playground wiring (header menu view modes, height calc, router) Coherent with the merged backend: harness/sandbox/permission_policy ride parameters.agent, harness_kwargs bag, pi_core default — taken from big-agents (the earlier Option-A reverts are undone). tsc 0 (entity-ui/agenta-ui/entities/playground; oss 593 baseline, 0 new), eslint clean, unit tests pass (entity-ui 126, playground 122, entities 683).
What & why
The agent playground's model picker showed the whole LiteLLM catalog, unfiltered by harness, with a separate redundant Provider field and a free-text connection slug. So an author could pick a model Claude can't reach, or an OpenAI provider for the Claude harness, and only find out when the run failed. The credential UX was a bare mode dropdown plus a free-text box.
This makes the agent picker harness-aware and inspect-driven: the agent's
/inspectnow publishes the exact models each harness can reach, and the playground renders one unified provider + model picker from it, with a clear authentication choice and a connection picker fed by the vault.Before → after
/inspect. Selecting a model now sets both the provider and the model id; the redundant standalone Provider field is gone.TODOto consume inspect) → the inspect-published map, threaded through a new atom. Permissive fallback to the full catalog when an agent doesn't publish models (older agents / standalone).GET /secrets/). Raw-JSON escape hatch kept.config.model: could be a free-text string → always a structuredModelRef({provider, model, connection?}). The picker is the only way to set it. (Folds in [docs] Add agent workflow interface inventory #4821 comment 3469645457.)How
HarnessConnectionCapabilitiesgainsmodels: Dict[str, List[str]];HARNESS_CONNECTION_CAPABILITIESpopulates it. Pi maps eachPI_VAULT_PROVIDERSentry to itssupported_llm_modelscatalog ids (_pi_models()skips a provider missing from the catalog); Claude mapsanthropicto a newCLAUDE_MODEL_ALIASESconstant (default/sonnet/opus/haiku+[1m]variants).harness_capabilities_document()emits it, so/inspectmeta.harness_capabilitiescarries per-harness models. Not a/runwire change.harnessCapabilitiesAtomFamily(revisionId)derivesmeta.harness_capabilitiesfrom the existingworkflowInspectAtomFamily.connectionUtils.tsretired the static FE capability map; its helpers now take the inspect-fedHarnessCapabilitiesMapand stay permissive when it's absent.buildModelOptionGroupsbuilds the grouped options from the harness's published models;SelectLLMProviderBaserenders them.providerForModelderives the provider from the picked model's group;harnessAllowsModelclears an unreachable model on harness switch.Authenticationtoggle maps toconnection.mode;namedConnectionOptionslists vault custom-provider connections (the slug the resolver matches on) filtered to the chosen provider + harness. Self-managed sends{mode: "self_managed"}and shows a note.Tests
modelsmap; Pi ⊆supported_llm_models; Claude = the alias set; the document round-trips as plain dicts). green (11 tests in the file; 50 in the connections suite).connectionUtilsunit tests rewritten for the inspect-fed helpers + the always-ModelRef compose + the picker/connection-option helpers. green (24 tests; 126 across entity-ui).@agenta/entities,@agenta/entity-ui,@agenta/playgroundtypecheck green. ruff + prettier + eslint clean.Note for reviewers
supported_llm_models["openai"]lists bare ids (gpt-5.5, …) while other providers are prefixed (anthropic/claude-...). The picker uses the catalog id verbatim as the option value and takes the provider from the group key, so the mix is handled; the contract test deliberately does not assert provider-prefixing.Docs synced in this PR: the
/inspectinterface inventory entry (now listsmodels) anddocumentation/agent-configuration.md(picker + auth UX).https://claude.ai/code/session_01GYo3UEfvsZpncagqb28Mbc