Fail fast on systemic SearchQA rollout failures by summerview1997 · Pull Request #64 · microsoft/SkillOpt

summerview1997 · 2026-06-16T01:21:16Z

Summary

This PR makes SearchQA rollout fail fast when every item in a batch failed before the target agent produced any response.

Previously, per-item exceptions such as model endpoint misconfiguration were recorded as ordinary failed answers. If every item had agent_ok=false, the trainer could continue with a complete-looking run and all-zero scores, even though no agent responses were produced.

Changes

Add a SearchQA rollout guard that detects all rows with agent_ok=false.
Raise a runtime error summarizing the most common fail_reason.
Apply the guard to both resumed/cached result paths and newly completed batches.
Keep ordinary wrong-answer results valid when at least one row has an agent response.
Add regression tests for cached systemic failures and answered wrong rollouts.

Impact

Infrastructure failures such as missing or unreachable model endpoints become visible immediately instead of being mistaken for model quality or skill optimization failure.

Validation

/home/thomas/SkillOpt/.venv/bin/python -m pytest -q tests/test_searchqa_rollout_failfast.py
/home/thomas/SkillOpt/.venv/bin/python -m pytest -q
/home/thomas/SkillOpt/.venv/bin/python -m ruff check skillopt/envs/searchqa/rollout.py tests/test_searchqa_rollout_failfast.py
/home/thomas/SkillOpt/.venv/bin/python -m py_compile skillopt/envs/searchqa/rollout.py tests/test_searchqa_rollout_failfast.py
git diff --check

summerview1997 added 2 commits June 16, 2026 09:20

Fail fast on systemic SearchQA rollout failures

da79962

Add SearchQA rollout fail-fast tests

923becb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail fast on systemic SearchQA rollout failures#64

Fail fast on systemic SearchQA rollout failures#64
summerview1997 wants to merge 2 commits into
microsoft:mainfrom
summerview1997:codex/searchqa-rollout-failfast

summerview1997 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

summerview1997 commented Jun 16, 2026

Summary

Changes

Impact

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant