Skip to content

fix(review-pr): only flag problems the PR introduces, not pre-existing issues#232

Merged
Sayt-0 merged 3 commits into
mainfrom
fix/pr-review-tone-no-directives
Jun 10, 2026
Merged

fix(review-pr): only flag problems the PR introduces, not pre-existing issues#232
Sayt-0 merged 3 commits into
mainfrom
fix/pr-review-tone-no-directives

Conversation

@Sayt-0

@Sayt-0 Sayt-0 commented Jun 9, 2026

Copy link
Copy Markdown
Member

What

docker/gordon#331, clarified by the author, is about scope, not tone: the reviewer should keep offering concrete suggestions for changes, but only on problems the PR actually introduces — never on pre-existing issues.

In docker/pinata#40244, most findings were not regressions. The subtle ones (e.g. r3000784715 and the discarded-error comments) were on changed lines — os.Remove_ = fileutil.Remove — yet the problem (discarded error, unconditional fallback) pre-dated the change. A changed line does not make a pre-existing problem new.

Change

Reverts the earlier "call out, never demand changes" tone layer (suggestions are wanted again) and replaces it with a scope rule across root, drafter, verifier, and the posting-format ref:

  • A finding is in scope only if reverting the PR would remove the problem. The test is whether the problem is new, not whether the line is new — "touched by the diff" != "introduced by the diff".
  • Verifier `in_changed_code` follows the same rule; pre-existing problems are DISMISSED.
  • Concrete suggestions on in-scope findings are explicitly allowed.

Validation

  • `scope-pre-existing-not-flagged-1` — `os.Remove` → `_ = fileutil.Remove` on a line that already discarded its error must NOT be flagged (reproduces the pinata false positives).
  • `scope-introduced-bug-flagged-1` — an import-alias compile bug the PR introduces must be flagged, suggestion allowed.

Both 3/3, 36/36 relevance (100%) via `docker agent eval`.

Closes docker/gordon#331

The PR reviewer phrased findings as directives (**Fix:** blocks, imperative
or passive remedy phrasing, 'either X or Y' options, suggested alternative
APIs), including on pre-existing/unchanged lines. Per docker/gordon#331 the
bot should call out issues but never tell the author what to change.

Reworks the tone guidance in the root, drafter, and verifier instructions
plus the posting-format ref into an intent-based rule: describe only the
defect and its consequence, never how to resolve it. Adds two evals locking
both cases (pre-existing-remedy and directive-on-changed-code), 6/6 passing.

Closes docker/gordon#331

@docker-agent docker-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The tone rule is applied consistently across all four touch points (root agent instructions, drafter output schema, verifier details section, and the posting-format reference). The prohibition is framed correctly as intent-based ("stop at the impact") rather than keyword-based, which is the right approach for LLM instruction.

The two new evals cover both motivating cases (pre-existing-line interaction and changed-code compile error) with well-chosen relevance criteria. The only candidate gap examined — whether the eval for the import-alias case explicitly covers "naming a corrected alias" — is already addressed by criterion 6 ("does NOT present a corrected import line or a corrected call as the required edit"), which the verifier confirmed is broad enough to catch that phrasing.

No bugs found in the changed code.

docker/gordon#331 is about scope, not phrasing: the reviewer should keep
offering concrete suggestions, but only on problems the PR introduces —
never on pre-existing ones. The key distinction (proven by the findings on
docker/pinata#40244) is that a CHANGED line does not make a pre-existing
problem new: swapping os.Remove for fileutil.Remove on a line that already
discarded its error introduces nothing. The test is whether the problem
disappears when the PR is reverted.

- Revert the "call out, never demand changes" tone layer across root,
  drafter, verifier, and posting-format; suggestions are re-allowed.
- Root/drafter/verifier: flag a finding only if reverting the PR removes the
  problem; "touched by the diff" != "introduced by the diff". Grounded with
  the os.Remove->fileutil.Remove discarded-error example.
- posting-format: "Comment Tone" -> "Comment Scope".
- Replace the two tone-* evals with scope-* evals: a discarded error already
  discarded before the PR must NOT be flagged; an introduced import-alias
  compile bug must be flagged, suggestion allowed. Both 3/3, 36/36 (100%).
@Sayt-0 Sayt-0 changed the title fix(review-pr): call out findings without prescribing changes fix(review-pr): only flag problems the PR introduces, not pre-existing issues Jun 9, 2026
@Sayt-0

Sayt-0 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

/review

@docker-agent docker-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The scope rule is logically sound and the changes are well-structured. The drafter and verifier were updated consistently with the same rule, and the canonical example is accurate in both Go-semantics terms and intent.

Review coverage: 1 diff chunk, 4 changed files reviewed.
Findings: 2 medium-severity hypotheses raised by drafter, both DISMISSED after verification.

@derekmisler derekmisler left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one real issue — there's a stale DISMISSED definition in the verifier's response-population section that wasn't updated with the rest of the file.

around line 703 in review-pr/agents/pr-review.yaml, the bullet still reads:

- DISMISSED: Proven not a bug — you can cite the specific code in the snippet that
  prevents it, OR the finding is not in code changed by this PR

the second OR clause uses the old test ("not in code changed by this PR"). this is exactly the case the PR is designed to fix: in the pinata false positive, the fileutil.Remove line IS in changed code, so that old condition wouldn't qualify for dismissal. the CRITICAL block above it and in_changed_code below it both got updated to the new "would the problem still exist after reverting?" test, but this one bullet slipped through.

suggested fix:

- DISMISSED: Proven not a bug — you can cite the specific code in the snippet that
  prevents it, OR the problem is pre-existing (would still exist after reverting the PR)

everything else looks solid. the revert test is stated consistently in the drafter and verifier, the canonical os.Remove example appears in all the right places, the evals exercise both the "flag it" and "don't flag it" paths end-to-end. the PR description explains the before/after clearly.

@Sayt-0 Sayt-0 merged commit 9061c2e into main Jun 10, 2026
13 checks passed
@Sayt-0 Sayt-0 deleted the fix/pr-review-tone-no-directives branch June 10, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants