fix(ci): re-run only failed specs on Windows test retries (PER-9011)#2267
Open
AkashBrowserStack wants to merge 2 commits into
Open
fix(ci): re-run only failed specs on Windows test retries (PER-9011)#2267AkashBrowserStack wants to merge 2 commits into
AkashBrowserStack wants to merge 2 commits into
Conversation
Windows @percy/core CI took ~241 min vs ~19 min on Linux. The cause was not a single slow run: ~1 of 1078 node specs flakes per run on Windows (browser/local-server timing), and windows.yml retried the entire ~60-min suite up to 4x, with continue-on-error masking the failures as green. Make retries cheap instead of fighting every flaky spec: - scripts/test.js records failed spec names to PERCY_NODE_FAILURES_FILE and, when PERCY_ONLY_FAILED_SPECS=1, uses a jasmine specFilter to re-run only those specs. A filtered run reports "incomplete" (rest skipped), so success there means the targeted specs passed; full runs keep jasmine's strict status. No behavior change when the env vars are unset (Linux/local). - windows.yml: retry0 runs the full suite; retries 1-4 re-run only the specs that just failed (seconds, not ~60 min). - Add a ::warning:: when a retry recovers a flake so "green via retry" stays visible. Verified locally against the real @percy/core suite: a 28-spec failure set re-ran in 12s vs 1304s for the full 1078-spec suite. Expected on Windows: ~241 min -> ~62 min, full coverage kept. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cli-doctor's env-audit check flags any process.env key starting with PERCY_, so injecting PERCY_NODE_FAILURES_FILE at the job level broke its "no Percy vars are set" specs on Windows CI. Rename to CLI_TEST_FAILURES_FILE and CLI_TEST_ONLY_FAILED (non-PERCY_) so the retry plumbing stays invisible to Percy-env detection. Verified locally: cli-doctor 508/508 pass with the renamed vars (both specs fail when a PERCY_-prefixed var is injected, confirming the cause); spec-level retry behavior unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Windows
Test @percy/corecheck takes ~241 min, vs ~19 min for the same check on Linux (Linux even runs with coverage). This slows every PR's required Windows checks. Tracked in PER-9011.Root cause
The 241 min is not one slow run — it's the same
@percy/coresuite run 4 times by the retry loop in.github/workflows/windows.yml:Run testsRetry (1/4)Retry (2/4)Retry (3/4)continue-on-error: truerewrites each failed step'sconclusiontosuccess, so the check looks green even though 3 of 4 attempts failed. Only ~1 spec out of 1078 flakes per run (browser/local-server timing on Windows) — and it's a different spec each time (e.g.Discovery > captures favicon when the server provides one,... promise only for sync snapshot). Re-running the entire ~60-min suite to recover one flaky spec is the real waste.Fix: make retries cheap instead of fighting every flaky spec
scripts/test.js— the Jasmine node runner records failed spec names toPERCY_NODE_FAILURES_FILE; when run withPERCY_ONLY_FAILED_SPECS=1it uses aspecFilterto re-run only those specs. A filtered run reportsincomplete(the rest are intentionally skipped), so success there means the targeted specs passed; full runs keep Jasmine's strict status. No behavior change when the env vars are unset (Linux/local/coverage all unaffected)..github/workflows/windows.yml—retry0runs the full suite once; retries 1-4 setPERCY_ONLY_FAILED_SPECS=1so they re-run only the spec(s) that just failed (seconds, not ~60 min).Flag flaky testsstep emits a::warning::whenever a retry recovered a flake, so "green via retry" stays visible.Verification (local, against the real
@percy/coresuite)incomplete, so the exit logic now treats "targeted specs passed" as success (otherwise every retry would have looked like a failure).Expected impact
~241 min → ~62 min for
@percy/coreon Windows (one full pass + cheap retries), full coverage preserved.Draft until Windows CI confirms the timing.
🤖 Generated with Claude Code