Skip to content

fix(api): keep prompt input_keys when projecting evaluation inputs#4807

Open
jayjcoding wants to merge 2 commits into
Agenta-AI:mainfrom
jayjcoding:main
Open

fix(api): keep prompt input_keys when projecting evaluation inputs#4807
jayjcoding wants to merge 2 commits into
Agenta-AI:mainfrom
jayjcoding:main

Conversation

@jayjcoding

@jayjcoding jayjcoding commented Jun 23, 2026

Copy link
Copy Markdown

Summary

Running an evaluation against a testset fails on every row when the target is a
builtin prompt/chat workflow whose template variable is not a declared structural
input. The application step returns HTTP 400:

Invalid inputs:\nExpected '['context']'\nGot ('list') '[]'.
(type v0:schemas:invalid-inputs)

Root cause: _project_inputs in
api/oss/src/core/evaluations/runtime/adapters.py filters each testcase row down to
the keys in data.schemas.inputs.properties before invoking the workflow (to drop
bookkeeping columns like correct_answer/testcase_id). For builtin prompt
workflows that schema declares only the structural inputs (e.g. messages for
chat:v0), not the prompt template variables (e.g. context) — which live in
parameters.prompt.input_keys and are what completion_v0/chat_v0 actually
validate against. So the projection strips context, the workflow is invoked with
{}, and validation fails.

Fix: widen the projection allow-list to the union of the schema's property keys
and the prompt's declared input_keys, so a template variable absent from the schema
is no longer dropped. Bookkeeping columns are still removed; the no-schema
pass-through path is unchanged (untyped/legacy revisions unaffected).

Testing

Verified locally

Reproduced on a self-hosted OSS stack: a chat app whose prompt uses {{context}},
evaluated against a testset with a context column. Before the change the
application step fails with Invalid inputs: Expected ['context'] Got []; after the
change context flows through and the app proceeds to the model call.

Added or updated tests

Added api/oss/tests/pytest/unit/evaluations/test_project_inputs.py covering the
chat-app case (schema declares only messages, prompt needs context), the
bookkeeping-column drop, and the no-schema pass-through. All pass.

QA follow-up

N/A

Demo

Before the change
Screenshot 2026-06-23 at 7 10 05 PM

After the change
Screenshot 2026-06-23 at 7 09 01 PM

Checklist

  • I have included a video or screen recording for UI changes, or marked Demo as N/A
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA, or I will sign it when the bot prompts me

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 23, 2026
@vercel

vercel Bot commented Jun 23, 2026

Copy link
Copy Markdown

@jayjcoding is attempting to deploy a commit to the agenta projects Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant

CLAassistant commented Jun 23, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@dosubot dosubot Bot added Bug Report Something isn't working tests labels Jun 23, 2026
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

✅ Thanks @jayjcoding! This PR now meets the contribution requirements and has been reopened. A maintainer will review it soon.

@github-actions github-actions Bot added the incomplete-pr PR is missing required template sections or a demo recording label Jun 23, 2026
@github-actions github-actions Bot closed this Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ec3430a1-0a8b-4ff4-8bb0-64bd4c5df749

📥 Commits

Reviewing files that changed from the base of the PR and between 8b7e319 and fea3006.

📒 Files selected for processing (2)
  • api/oss/src/core/evaluations/runtime/adapters.py
  • api/oss/tests/pytest/unit/evaluations/test_evaluation_inputs.py

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.


📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Fixed input processing to preserve all declared prompt template variables during evaluation. Previously, certain template variables were filtered out if not present in the input schema.
  • Tests

    • Added comprehensive unit tests to verify input processing behavior with various configuration scenarios, ensuring template variables are properly preserved during evaluation runs.

Walkthrough

_project_inputs in the evaluation runtime adapter is updated to union the revision input-schema property keys with all input_keys collected from nested prompt configurations via a new _collect_prompt_input_keys helper. Five unit tests are added covering bookkeeping-field removal, no-schema passthrough, chat-workflow template variable preservation, wrapped ag_config structure, and non-dict inputs.

Changes

Input projection fix: preserve prompt template variables

Layer / File(s) Summary
_collect_prompt_input_keys helper and _project_inputs union logic
api/oss/src/core/evaluations/runtime/adapters.py
Adds _collect_prompt_input_keys that recursively walks data.parameters to extract all input_keys strings from any dict/list nodes. Updates _project_inputs to compute allowed as the union of input-schema properties keys and collected prompt input_keys, replacing the previous schema-only filter. Expands the docstring to document the schema-union-prompt-keys contract.
Unit tests for _project_inputs
api/oss/tests/pytest/unit/evaluations/test_evaluation_inputs.py
Adds a new test module with a _data fixture helper and five tests: bookkeeping field removal, passthrough when schema is absent or empty, chat-workflow bug regression (context kept via prompt.input_keys even when absent from schema), wrapped ag_config nesting, and non-dict input passthrough.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot removed the incomplete-pr PR is missing required template sections or a demo recording label Jun 23, 2026
@github-actions github-actions Bot reopened this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Report Something isn't working size:M This PR changes 30-99 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants