Stage G: surface typed tool-result payloads to the planner#9
Closed
Anmolnoor wants to merge 1 commit into
Closed
Conversation
The observation builder only ever read artifact["stdout"], but typed capabilities don't use that field — file.read puts its data in "content", search/files/git in structured fields. So every typed read came back BLANK to the planner: it saw the action "executed" but none of the data. With file.read in particular the model would re-read the same file every iteration, conclude "the content isn't being returned," and loop until the no-progress detector or an empty-response flake killed the turn. (Masked until now because the model got its data from shell `gh api`, whose stdout *is* surfaced.) Add _tool_result_preview: for read-only result types (file read/read_chunk, search, files, git inspect, man/tldr) surface the payload (file content, or a compact JSON of the structured result) into the observation's stdout_preview, capped by the existing _truncate_preview limits. Writes/mutations are deliberately excluded so we don't echo just-written content back and re-bloat the prompt. Verified live: "read the report and summarize" now reads once, sees the content, and answers in a few iterations instead of looping ~20x and crashing. Tests: _tool_result_preview surfaces reads + search but not writes; an orchestrator read surfaces the file content into the next iteration's planner context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
✅ Stages A–G merged to
|
| Stage | PR | What it fixed |
|---|---|---|
| A | #3 | Truncation is now an explicit error + a repair nudge; raw response persisted to the event log |
| B | #4 | Large file bodies no longer inlined in the plan JSON — deferred via content_brief + a separate generation call |
| C | #5 | The agent can pause and ask a clarifying question instead of guessing |
| D | #6 | Out-of-scope reads ask permission and grant session-scoped, read-only access |
| E | #7 | Deferred-write generation no longer wraps output in a plan blob; repair retries use temperature jitter |
| F | #8 | FILE_NOT_FOUND lists sibling files so a wrong-name guess self-corrects |
| G | this | Typed tool results (file reads, search, git) are surfaced to the planner — previously only shell stdout was, so typed reads came back blank and the model looped |
405 tests passing, ruff clean. Verified end-to-end against qwen3.5:397b-cloud: generate a GitHub report, then read it back and summarize — both work cleanly.
The throughline
Most of what looked like "the model is dumb" was actually the harness not feeding the model what it needed — truncated plans (A/B), plan-wrapped output (E), blank tool results (G). Those are all fixed. The genuinely model-side residue (intermittent empty completions on Ollama Cloud) is narrow and best addressed by a stronger model behind the same loop.
Closing #3–#8 as well; all their commits are in main via 27c97b0.
This was referenced May 27, 2026
Owner
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The big one. Asked to "read the file in res and give me the data", the agent read the file 18 times and then crashed — looking dumb. It wasn't: fcli was hiding the tool output from the model.
_build_observationonly ever readartifact["stdout"]. But typed capabilities don't use that field —file.readputs its data incontent, search/files/git in structured fields. So every typed read came back blank to the planner. Withfile.readthe model re-read the same file every iteration, its own thinking saying "the content wasn't being returned properly," and looped until the no-progress detector / an empty-response flake killed the turn.This was latent the whole time — masked because the model fetched data via shell
gh api, whosestdoutis surfaced. The moment a task depended on a typed read, the agent was flying blind.What changed
_tool_result_preview: for read-only result types (file.read/read_chunk,search,files, git inspect,man/tldr) surface the payload — the filecontent, or a compact JSON of the structured result — into the observation'sstdout_preview, capped by the existing_truncate_previewlimits (8 KB / 200 lines). Writes and mutations are excluded so we don't echo just-written content back and re-bloat the plan prompt.Verification
Live: "read the report in res about anmolnoor and give me a 2-line summary" now reads once, sees the content, and answers correctly —
— in 4 iterations, versus the previous 18-read dead-loop + crash.
Tests
2 new (405 total, ruff clean):
_tool_result_previewsurfaces reads + search but not writes; an orchestrator read surfaces the file content into the next iteration's planner context.🤖 Generated with Claude Code