fix(pi): recover submit-missing in session and surface triage verdicts#1535
Merged
Conversation
Close the third submit-output recovery gap from #1528. The existing in-session retry only covered (1) invalid submit args and (2) provider errors. A weaker model that ends its turn with prose and never calls the submit tool at all left the attempt to fail straight to `submit_output_missing` with no recovery — the exact dogfood failure on the run_eval producer lane. pi-extension: - Add `promptUntilSubmitted`: after a clean pass with no captured output and no exhausted validation budget, re-prompt the same session (default 3x) naming the submit tool and forbidding a prose reply. Each re-prompt inherits the existing provider-error / cancel / cap handling. Strict re-prompt-then-fail: a model that still refuses fails with `submit_output_missing` as before. - `buildSubmitMissingPrompt`, `maxSubmitMissingReprompts` / `submitMissingPrompt` opts, `toolName` on the submit handle, and a `submit_missing_reprompt` info event. agent-daemon: - `submit_output_missing` is now deterministically non-retryable: the runtime already re-prompts within the attempt, so a fresh attempt against the same model is unlikely to differ. - `attempt-failure-classified` logs the structured verdict (source, code, retryable, decision, confidence, reason) instead of an opaque `:source` suffix; `FinalizeContext.log` now takes a fields object. Refs #1528
Contributor
✅ CLI go.mod matches internal Go module releases
|
Contributor
🚨 Dependency Audit — Vulnerabilities foundFull report |
Contributor
|
Self-review follow-up to the in-session submit-missing fix. pi-extension: - Stop re-prompting on a persisted provider error. A spent or non-retryable provider error leaves `llmAbort` set even though `promptWithProviderErrorRetries` returns `runError: null`; the loop's `isStopped` only checked cancel/cap, so it nudged a dead provider up to 3 more times. Extract `submitRepromptStopped` (cancel | cap | llmAbort) and gate on it. - Extract `resolveSubmitMissingConfig` (pure) so the default-budget / disable-when-no-tool / gate-mapping wiring — previously only reachable through a booted VM — is unit-tested. An inverted ternary now fails a test. Drops the unreachable `'Go on'` fallback to `''`. - Record `output_missing` on the never-called submit path (dimensioned by model) and emit a `submit_missing_summary` event with the reprompt count, so submit-missing failures and nudge frequency are visible to the parse-result counter / log stream. agent-daemon: - Bind taskId / attemptN / correlationId onto `attempt-failure-classified` so a verdict can be pivoted back to the abandoned task (the daemon log child carries only agent/team/profile). - Redact the model-authored `reason` through `redactRetryTriageSecrets` before logging — parity with the already-sanitized `triage_failed` path; the prior log shipped it raw to a wide-access sink. Tests: submitRepromptStopped, resolveSubmitMissingConfig, the classification log identifiers + redaction + decision/confidence-omitted branch, and full-path classifyAttemptFailure non-retryability for submit_output_missing. Refs #1528
getlarge
approved these changes
Jun 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Dogfood failure on the
run_evalproducer lane (taskb2d3c26f, profileil1395-gemma,gemma4:31b-cloud): the attempt failed inoutput_validationwithThis is the third submit-output recovery class called out in #1528 (comment). The in-session retry shipped earlier only covered:
isError, model re-calls; andpromptWithProviderErrorRetriesre-prompts.A weaker model that ends its turn with prose and never calls the submit tool at all matched neither path (
getCaptured()andgetExhaustedValidationFailure()both null), so the executor fell straight tosubmit_output_missingand stranded the lane.What
libs/pi-extensionpromptUntilSubmitted: after a clean pass with no captured output and no exhausted validation budget, re-prompt the same session (default 3×) naming the exact submit tool and forbidding a prose reply. Each re-prompt inherits the existing provider-error / cancel / cap handling. Strict re-prompt-then-fail — a model that still refuses fails withsubmit_output_missingas before.buildSubmitMissingPrompt,maxSubmitMissingReprompts/submitMissingPromptopts,toolNameon the submit handle, and asubmit_missing_repromptinfo event.apps/agent-daemonsubmit_output_missingis now deterministically non-retryable: the runtime already re-prompts within the attempt, so a fresh attempt against the same model is unlikely to differ (addresses "why didmaxAttempts: 2not recover").attempt-failure-classifiednow logs the structured triage verdict (source,code,retryable,decision,confidence,reason) instead of an opaque:sourcesuffix.FinalizeContext.logtakes a fields object;once.ts/poll-shared.tswiring updated.Already handled (no change here)
The Go CLI
decode TaskAttemptError: unexpected field "retry"from the issue comment was fixed by4010ad8d(regenerated clients) and is in releasedcli-v1.58.0— re-runningmoltnet task attempts b2d3c26f...now decodes.Tests
TDD throughout: 9 new
promptUntilSubmittedcases +buildSubmitMissingPrompt+toolName+ the non-retryable classification + the structured-log assertion.lint,test,typecheckgreen for@themoltnet/pi-extensionand@themoltnet/agent-daemon;nx affectedclean.Refs #1528