fix(cursor-review): harden judge against parse_error (lenient extraction + retry + panel fallback) (BE-1916) by mattmillerai · Pull Request #17 · Comfy-Org/github-workflows

mattmillerai · 2026-06-29T08:31:55Z

ELI-5

Sometimes the "judge" model that combines all the reviewers' notes answers in
plain English ("Based on my review, I think…") and then pastes the JSON we
need. Our old parser only knew a couple of rigid ways to find that JSON, so it
choked on the prose, threw everything away, and the whole PR review showed up as
a scary "Review failed" — even when the judge's answer was actually correct
(this happened on ComfyUI #487 and Alex's PR). This PR teaches the pipeline to
(1) dig the JSON out of the prose, (2) politely ask the judge once more if it
still can't, and (3) if even that fails, show the raw reviewer notes instead of
a blank failure.

What changed

Three layers, cheapest first, all in the cursor-review judge path:

Lenient extraction (extract-findings.py) — parse_json_findings now
recovers JSON wrapped in prose via a string/escape-aware balanced
brace-scan for the first {…}/[…] region that actually parses, instead
of the old naive first-[/last-] slice (which broke the moment the prose
contained a stray bracket — exactly the #487 shape). New coerce_findings_list
also unwraps a {"findings": […]}-style object. The fast path (clean JSON)
and the fenced-json path are unchanged.
Retry once (cursor-review.yml → Run judge) — after the judge runs we
probe-parse its output; on a hard parse failure we re-run the judge exactly
once with a terse "Return ONLY the JSON array, no prose" reminder appended.
No loop — one retry, then we stop.
Tighten the judge prompt (prompt-judge.md) — a bare-JSON-array demand is
now the first line of the prompt.

Plus the degraded fallback: if the judge still can't be parsed after the
retry and the panel produced findings, we post the raw, de-duplicated per-cell
findings under a warning banner — never a bare "Review failed" when there
are real findings to show. A genuine total failure (no panel findings either)
still surfaces the honest error review.

Acceptance criteria

Prose-wrapped valid JSON is parsed successfully, not failed — parse_json_findings + balanced scan; regression-tested.
A malformed judge response triggers exactly one retry; only a second failure surfaces as an error, and even then posts the per-cell findings (deduped, with a banner) rather than a bare "Review failed".
Regression test feeding a #487-style prose-wrapped-JSON sample through the parser — .github/cursor-review/tests/test_extract_findings.py (test_prose_wrapped_array_487 + 13 more).

Testing

python3 -m unittest discover -s .github/cursor-review/tests → 14 tests, all green. New CI workflow test-cursor-review-scripts.yml runs them on any change under .github/cursor-review/**.
Simulated the consolidate step's three branches locally (judge-ok / judge-failed-with-panel-findings / no-contributors) and dry-ran post-review.py --notice against a stub gh: the degraded path posts a real review with inline comments + banner, not "Review failed".

Judgment calls / not done

JSON / structured-output mode (plan item 3, second half) — intentionally NOT
enabled. The judge is invoked through the cursor-agent CLI, which has no
reliable per-model knob to force the findings array shape; --output-format json wraps stdout in cursor-agent's own envelope, which would break the
existing parser. Flipping it blind is a net regression risk, so I relied on the
prompt tightening + the strict-JSON retry (which is structured-output pressure
at the prompt level) + the lenient parser. The acceptance criteria are met
without it. Happy to revisit if a confirmed JSON-mode flag exists.
Brace-scan first-match limitation: if the judge's prose contains a
standalone valid JSON array of scalars before the real findings array, the
scanner could pick the wrong one. This is contrived (covered case: prose with a
non-JSON […] recovers correctly, tested), strictly better than the old
behavior, and backstopped by the retry + prompt. Not over-engineered.

…ion + retry + panel fallback) (BE-1916)

coderabbitai · 2026-06-29T08:36:36Z

📝 Walkthrough

Walkthrough

The PR hardens the cursor-review AI code-review pipeline against malformed LLM judge output. extract-findings.py gains layered JSON parsing and a coerce_findings_list normalizer. cursor-review.yml adds a judge retry and a degraded fallback that unions per-cell panel findings. post-review.py gains a --notice banner flag. A regression test suite and CI workflow are added.

Changes

Robust Judge Output Parsing and Degraded Fallback

Layer / File(s)	Summary
Judge prompt output constraint `.github/cursor-review/prompt-judge.md`	Prepends a strict JSON-array-only directive to the judge prompt, forbidding prose and markdown fences.
Layered JSON parsing and `coerce_findings_list` `.github/cursor-review/extract-findings.py`	Reworks `parse_json_findings` with fenced-block extraction, string-aware balanced-region scanning, and scalar rejection; adds `coerce_findings_list` to unwrap wrapper objects; simplifies `main()` branching to a single `None` check.
Regression tests and CI workflow `.github/cursor-review/tests/test_extract_findings.py`, `.github/workflows/test-cursor-review-scripts.yml`	Adds `ParseJsonFindingsTest` covering all parse paths and `MainEndToEndTest` for status transitions; adds a `unittest discover` workflow triggered on PRs and pushes to `main`.
`--notice` banner in `post-review.py` `.github/cursor-review/post-review.py`	Extends argument parser with optional `--notice`, sanitizes it against GitHub mentions, and prepends it to the review header.
Retry, degraded fallback, and post-review branching in workflow `.github/workflows/cursor-review.yml`	Wraps `cursor-agent` in `run_judge()` with one retry on non-ok parse; replaces always-use-judge-findings with conditional union of deduplicated per-cell panel findings on failure; emits `consolidated_count` and `degraded`; branches post-review into three cases: normal, warning-banner `--notice`, or hard error — quite the trifecta of tricks up its sleeve!

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch matt/be-1916-harden-judge-parse

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch matt/be-1916-harden-judge-parse

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/cursor-review/extract-findings.py:
- Around line 76-77: The scanning logic in the bracket parser is incorrectly
advancing past the rest of the input when it encounters an unterminated opener,
which can hide a later valid JSON payload. Update the loop in the extraction
routine that uses the `i = j + 1` resume step so an unbalanced `[` or `{` only
skips the bad region and continues scanning from the next viable position
instead of jumping to EOF. Keep the fix localized to the parser’s main scan path
so later valid content can still be detected.
- Around line 120-124: The wrapper-unwrapping logic in coerce_findings_list() is
too permissive because it accepts any list under keys like results, items, or
reviews, which lets malformed payloads through as status="ok". Tighten the
validation in coerce_findings_list() so it only returns lists whose contents
match the expected findings schema, and otherwise let main() treat the payload
as a parse/schema failure instead of a successful result.

In @.github/workflows/test-cursor-review-scripts.yml:
- Around line 27-35: The workflow is using floating GitHub Actions tags for
Checkout and Set up Python, which should be pinned to full commit SHAs. Update
the uses entries in the workflow to replace actions/checkout@v6 and
actions/setup-python@v6 with their exact commit hashes so the referenced actions
in the checkout/setup steps remain immutable.
- Around line 8-17: Add a concurrency policy to the workflow so superseded runs
are canceled for the same branch or PR. Update the workflow definition in
test-cursor-review-scripts.yml near the existing on/pull_request and push
triggers by introducing a concurrency group keyed to the ref or PR identity, and
enable cancel-in-progress so force-pushes don’t queue duplicate runs. Use the
workflow’s existing trigger context to choose the group consistently across push
and pull_request events.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 3ab20d41-e367-44b2-8e29-e41b9d9f9e56

📥 Commits

Reviewing files that changed from the base of the PR and between 51e24ab and 6d4a388.

📒 Files selected for processing (6)

.github/cursor-review/extract-findings.py
.github/cursor-review/post-review.py
.github/cursor-review/prompt-judge.md
.github/cursor-review/tests/test_extract_findings.py
.github/workflows/cursor-review.yml
.github/workflows/test-cursor-review-scripts.yml

coderabbitai · 2026-06-29T08:45:16Z

+        # Resume scanning after this region (or after an unbalanced opener).
+        i = j + 1


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don’t stop scanning after an unterminated opener.

If the prose contains a stray [ or { that never balances, Line 77 jumps i to EOF and the parser never reaches a later valid JSON payload. One bad bracket, big racket: a response like note [draft...\n[{"file":...}] still ends up as parse_error.

Proposed fix

- # Resume scanning after this region (or after an unbalanced opener). - i = j + 1 + # Resume after a closed region; otherwise advance one char so later JSON + # candidates are still reachable. + if j < n and depth == 0: + i = j + 1 + else: + i += 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Resume scanning after this region (or after an unbalanced opener).

i = j + 1

# Resume after a closed region; otherwise advance one char so later JSON

# candidates are still reachable.

if j < n and depth == 0:

i = j + 1

else:

i += 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/cursor-review/extract-findings.py around lines 76 - 77, The scanning logic in the bracket parser is incorrectly advancing past the rest of the input when it encounters an unterminated opener, which can hide a later valid JSON payload. Update the loop in the extraction routine that uses the `i = j + 1` resume step so an unbalanced `[` or `{` only skips the bad region and continues scanning from the next viable position instead of jumping to EOF. Keep the fix localized to the parser’s main scan path so later valid content can still be detected.

coderabbitai · 2026-06-29T08:45:16Z

+    if isinstance(parsed, dict):
+        for key in ("findings", "results", "items", "reviews"):
+            value = parsed.get(key)
+            if isinstance(value, list):
+                return value


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate wrapper contents before returning status="ok".

coerce_findings_list() unwraps any list under results, items, or reviews, so payloads like {"results":["oops"]} are now treated as successful findings. That breaks the downstream contract in main() and can push malformed data into the judge/post-review path instead of surfacing a parse/schema failure.

Proposed fix

+def _is_findings_list(value): + required = {"file", "line", "side", "severity", "body"} + return isinstance(value, list) and all( + isinstance(item, dict) and required.issubset(item) for item in value + ) + def coerce_findings_list(parsed): @@ - if isinstance(parsed, list): + if _is_findings_list(parsed): return parsed if isinstance(parsed, dict): for key in ("findings", "results", "items", "reviews"): value = parsed.get(key) - if isinstance(value, list): + if _is_findings_list(value): return value return None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if isinstance(parsed, dict):

for key in ("findings", "results", "items", "reviews"):

value = parsed.get(key)

if isinstance(value, list):

return value

def _is_findings_list(value):

required = {"file", "line", "side", "severity", "body"}

return isinstance(value, list) and all(

isinstance(item, dict) and required.issubset(item) for item in value

)

if _is_findings_list(parsed):

return parsed

if isinstance(parsed, dict):

for key in ("findings", "results", "items", "reviews"):

value = parsed.get(key)

if _is_findings_list(value):

return value

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/cursor-review/extract-findings.py around lines 120 - 124, The wrapper-unwrapping logic in coerce_findings_list() is too permissive because it accepts any list under keys like results, items, or reviews, which lets malformed payloads through as status="ok". Tighten the validation in coerce_findings_list() so it only returns lists whose contents match the expected findings schema, and otherwise let main() treat the payload as a parse/schema failure instead of a successful result.

coderabbitai · 2026-06-29T08:45:16Z

+on:
+  pull_request:
+    paths:
+      - '.github/cursor-review/**'
+      - '.github/workflows/test-cursor-review-scripts.yml'
+  push:
+    branches: [main]
+    paths:
+      - '.github/cursor-review/**'
+      - '.github/workflows/test-cursor-review-scripts.yml'


🚀 Performance & Scalability | 🔵 Trivial

Add concurrency to cancel superseded runs.

Without a concurrency group, force-pushes can pile up duplicate test runs for the same branch. Don’t let CI breed like rabbits.

Suggested change

on: pull_request: @@ push: branches: [main] paths: - '.github/cursor-review/**' - '.github/workflows/test-cursor-review-scripts.yml' + +concurrency: + group: test-cursor-review-scripts-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} + cancel-in-progress: true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

on:

pull_request:

paths:

- '.github/cursor-review/**'

- '.github/workflows/test-cursor-review-scripts.yml'

push:

branches: [main]

paths:

- '.github/cursor-review/**'

- '.github/workflows/test-cursor-review-scripts.yml'

on:

pull_request:

paths:

- '.github/cursor-review/**'

- '.github/workflows/test-cursor-review-scripts.yml'

push:

branches: [main]

paths:

- '.github/cursor-review/**'

- '.github/workflows/test-cursor-review-scripts.yml'

concurrency:

group: test-cursor-review-scripts-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

cancel-in-progress: true

🧰 Tools

🪛 zizmor (1.26.1)

[warning] 8-17: insufficient job-level concurrency limits (concurrency-limits): workflow is missing concurrency setting

(concurrency-limits)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test-cursor-review-scripts.yml around lines 8 - 17, Add a concurrency policy to the workflow so superseded runs are canceled for the same branch or PR. Update the workflow definition in test-cursor-review-scripts.yml near the existing on/pull_request and push triggers by introducing a concurrency group keyed to the ref or PR identity, and enable cancel-in-progress so force-pushes don’t queue duplicate runs. Use the workflow’s existing trigger context to choose the group consistently across push and pull_request events.

Source: Linters/SAST tools

coderabbitai · 2026-06-29T08:45:16Z

+      - name: Checkout
+        uses: actions/checkout@v6
+        with:
+          persist-credentials: false
+
+      - name: Set up Python
+        uses: actions/setup-python@v6
+        with:
+          python-version: '3.12'


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Pin GitHub Actions to full commit SHAs.

Using floating @v6 tags here leaves the workflow open to upstream retags changing what executes in CI. A little hash makes the risk disappear in a flash.

Suggested change

- uses: actions/checkout@v6 + uses: actions/checkout@<full_commit_sha> # v6 with: persist-credentials: false @@ - uses: actions/setup-python@v6 + uses: actions/setup-python@<full_commit_sha> # v6 with: python-version: '3.12'

🧰 Tools

🪛 zizmor (1.26.1)

[error] 28-28: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

[error] 33-33: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test-cursor-review-scripts.yml around lines 27 - 35, The workflow is using floating GitHub Actions tags for Checkout and Set up Python, which should be pinned to full commit SHAs. Update the uses entries in the workflow to replace actions/checkout@v6 and actions/setup-python@v6 with their exact commit hashes so the referenced actions in the checkout/setup steps remain immutable.

Source: Linters/SAST tools

fix(cursor-review): harden judge against parse_error (lenient extract…

6d4a388

…ion + retry + panel fallback) (BE-1916)

mattmillerai added agent-coded Authored by the agent-work loop cursor-review Multi-model cursor review labels Jun 29, 2026

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

mattmillerai merged commit 499fe1d into main Jun 29, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cursor-review): harden judge against parse_error (lenient extraction + retry + panel fallback) (BE-1916)#17

fix(cursor-review): harden judge against parse_error (lenient extraction + retry + panel fallback) (BE-1916)#17
mattmillerai merged 1 commit into
mainfrom
matt/be-1916-harden-judge-parse

mattmillerai commented Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Uh oh!

coderabbitai Bot Jun 29, 2026

Uh oh!

coderabbitai Bot Jun 29, 2026

Uh oh!

coderabbitai Bot Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Resume scanning after this region (or after an unbalanced opener).
		i = j + 1

Uh oh!

Conversation

mattmillerai commented Jun 29, 2026

ELI-5

What changed

Acceptance criteria

Testing

Judgment calls / not done

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading