Skip to content

fix(ci): re-run only failed specs on Windows test retries (PER-9011)#2267

Open
AkashBrowserStack wants to merge 2 commits into
masterfrom
PER-9011_windows-spec-level-retry
Open

fix(ci): re-run only failed specs on Windows test retries (PER-9011)#2267
AkashBrowserStack wants to merge 2 commits into
masterfrom
PER-9011_windows-spec-level-retry

Conversation

@AkashBrowserStack
Copy link
Copy Markdown

Problem

The Windows Test @percy/core check takes ~241 min, vs ~19 min for the same check on Linux (Linux even runs with coverage). This slows every PR's required Windows checks. Tracked in PER-9011.

Root cause

The 241 min is not one slow run — it's the same @percy/core suite run 4 times by the retry loop in .github/workflows/windows.yml:

Step Duration True outcome
Run tests ~62 min failed → retried
Retry (1/4) ~59 min failed → retried
Retry (2/4) ~59 min failed → retried
Retry (3/4) ~59 min passed (retry 4 skipped)

continue-on-error: true rewrites each failed step's conclusion to success, so the check looks green even though 3 of 4 attempts failed. Only ~1 spec out of 1078 flakes per run (browser/local-server timing on Windows) — and it's a different spec each time (e.g. Discovery > captures favicon when the server provides one, ... promise only for sync snapshot). Re-running the entire ~60-min suite to recover one flaky spec is the real waste.

Fix: make retries cheap instead of fighting every flaky spec

  • scripts/test.js — the Jasmine node runner records failed spec names to PERCY_NODE_FAILURES_FILE; when run with PERCY_ONLY_FAILED_SPECS=1 it uses a specFilter to re-run only those specs. A filtered run reports incomplete (the rest are intentionally skipped), so success there means the targeted specs passed; full runs keep Jasmine's strict status. No behavior change when the env vars are unset (Linux/local/coverage all unaffected).
  • .github/workflows/windows.ymlretry0 runs the full suite once; retries 1-4 set PERCY_ONLY_FAILED_SPECS=1 so they re-run only the spec(s) that just failed (seconds, not ~60 min).
  • Honest check — a Flag flaky tests step emits a ::warning:: whenever a retry recovered a flake, so "green via retry" stays visible.

Verification (local, against the real @percy/core suite)

  • Retry re-runs only the failed subset — proven on real failures: 28 specs in 12s vs the full 1078 specs in 1304s (~108× cheaper retry).
  • Retry of passing spec → exit 0; retry of failing spec → exit 1 with the name recorded for the next retry.
  • Caught and fixed a real bug along the way: filtered runs report incomplete, so the exit logic now treats "targeted specs passed" as success (otherwise every retry would have looked like a failure).

Expected impact

~241 min → ~62 min for @percy/core on Windows (one full pass + cheap retries), full coverage preserved.

The real Windows wall-clock is validated by this PR's Windows CI run. The ~3× single-pass gap (Windows ~60 vs Linux ~19 min) is intentionally out of scope here — it's a separate per-run-speedup follow-up.

Draft until Windows CI confirms the timing.


🤖 Generated with Claude Code

AkashBrowserStack and others added 2 commits June 5, 2026 15:55
Windows @percy/core CI took ~241 min vs ~19 min on Linux. The cause was not
a single slow run: ~1 of 1078 node specs flakes per run on Windows
(browser/local-server timing), and windows.yml retried the entire ~60-min
suite up to 4x, with continue-on-error masking the failures as green.

Make retries cheap instead of fighting every flaky spec:
- scripts/test.js records failed spec names to PERCY_NODE_FAILURES_FILE and,
  when PERCY_ONLY_FAILED_SPECS=1, uses a jasmine specFilter to re-run only
  those specs. A filtered run reports "incomplete" (rest skipped), so success
  there means the targeted specs passed; full runs keep jasmine's strict
  status. No behavior change when the env vars are unset (Linux/local).
- windows.yml: retry0 runs the full suite; retries 1-4 re-run only the specs
  that just failed (seconds, not ~60 min).
- Add a ::warning:: when a retry recovers a flake so "green via retry" stays
  visible.

Verified locally against the real @percy/core suite: a 28-spec failure set
re-ran in 12s vs 1304s for the full 1078-spec suite. Expected on Windows:
~241 min -> ~62 min, full coverage kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cli-doctor's env-audit check flags any process.env key starting with PERCY_,
so injecting PERCY_NODE_FAILURES_FILE at the job level broke its
"no Percy vars are set" specs on Windows CI. Rename to CLI_TEST_FAILURES_FILE
and CLI_TEST_ONLY_FAILED (non-PERCY_) so the retry plumbing stays invisible to
Percy-env detection.

Verified locally: cli-doctor 508/508 pass with the renamed vars (both specs
fail when a PERCY_-prefixed var is injected, confirming the cause); spec-level
retry behavior unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AkashBrowserStack AkashBrowserStack marked this pull request as ready for review June 5, 2026 17:07
@AkashBrowserStack AkashBrowserStack requested a review from a team as a code owner June 5, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant