fix(review-pr): only flag problems the PR introduces, not pre-existing issues#232
Conversation
The PR reviewer phrased findings as directives (**Fix:** blocks, imperative or passive remedy phrasing, 'either X or Y' options, suggested alternative APIs), including on pre-existing/unchanged lines. Per docker/gordon#331 the bot should call out issues but never tell the author what to change. Reworks the tone guidance in the root, drafter, and verifier instructions plus the posting-format ref into an intent-based rule: describe only the defect and its consequence, never how to resolve it. Adds two evals locking both cases (pre-existing-remedy and directive-on-changed-code), 6/6 passing. Closes docker/gordon#331
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The tone rule is applied consistently across all four touch points (root agent instructions, drafter output schema, verifier details section, and the posting-format reference). The prohibition is framed correctly as intent-based ("stop at the impact") rather than keyword-based, which is the right approach for LLM instruction.
The two new evals cover both motivating cases (pre-existing-line interaction and changed-code compile error) with well-chosen relevance criteria. The only candidate gap examined — whether the eval for the import-alias case explicitly covers "naming a corrected alias" — is already addressed by criterion 6 ("does NOT present a corrected import line or a corrected call as the required edit"), which the verifier confirmed is broad enough to catch that phrasing.
No bugs found in the changed code.
docker/gordon#331 is about scope, not phrasing: the reviewer should keep offering concrete suggestions, but only on problems the PR introduces — never on pre-existing ones. The key distinction (proven by the findings on docker/pinata#40244) is that a CHANGED line does not make a pre-existing problem new: swapping os.Remove for fileutil.Remove on a line that already discarded its error introduces nothing. The test is whether the problem disappears when the PR is reverted. - Revert the "call out, never demand changes" tone layer across root, drafter, verifier, and posting-format; suggestions are re-allowed. - Root/drafter/verifier: flag a finding only if reverting the PR removes the problem; "touched by the diff" != "introduced by the diff". Grounded with the os.Remove->fileutil.Remove discarded-error example. - posting-format: "Comment Tone" -> "Comment Scope". - Replace the two tone-* evals with scope-* evals: a discarded error already discarded before the PR must NOT be flagged; an introduced import-alias compile bug must be flagged, suggestion allowed. Both 3/3, 36/36 (100%).
|
/review |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The scope rule is logically sound and the changes are well-structured. The drafter and verifier were updated consistently with the same rule, and the canonical example is accurate in both Go-semantics terms and intent.
Review coverage: 1 diff chunk, 4 changed files reviewed.
Findings: 2 medium-severity hypotheses raised by drafter, both DISMISSED after verification.
derekmisler
left a comment
There was a problem hiding this comment.
one real issue — there's a stale DISMISSED definition in the verifier's response-population section that wasn't updated with the rest of the file.
around line 703 in review-pr/agents/pr-review.yaml, the bullet still reads:
- DISMISSED: Proven not a bug — you can cite the specific code in the snippet that
prevents it, OR the finding is not in code changed by this PR
the second OR clause uses the old test ("not in code changed by this PR"). this is exactly the case the PR is designed to fix: in the pinata false positive, the fileutil.Remove line IS in changed code, so that old condition wouldn't qualify for dismissal. the CRITICAL block above it and in_changed_code below it both got updated to the new "would the problem still exist after reverting?" test, but this one bullet slipped through.
suggested fix:
- DISMISSED: Proven not a bug — you can cite the specific code in the snippet that
prevents it, OR the problem is pre-existing (would still exist after reverting the PR)
everything else looks solid. the revert test is stated consistently in the drafter and verifier, the canonical os.Remove example appears in all the right places, the evals exercise both the "flag it" and "don't flag it" paths end-to-end. the PR description explains the before/after clearly.
What
docker/gordon#331, clarified by the author, is about scope, not tone: the reviewer should keep offering concrete suggestions for changes, but only on problems the PR actually introduces — never on pre-existing issues.
In docker/pinata#40244, most findings were not regressions. The subtle ones (e.g. r3000784715 and the discarded-error comments) were on changed lines —
os.Remove→_ = fileutil.Remove— yet the problem (discarded error, unconditional fallback) pre-dated the change. A changed line does not make a pre-existing problem new.Change
Reverts the earlier "call out, never demand changes" tone layer (suggestions are wanted again) and replaces it with a scope rule across root, drafter, verifier, and the posting-format ref:
Validation
Both 3/3, 36/36 relevance (100%) via `docker agent eval`.
Closes docker/gordon#331