Skip to content

test(validator-client): deflake integration test with frozen clock#24259

Open
spalladino wants to merge 1 commit into
merge-train/spartan-v5from
spl/a-1265-deflake-validator
Open

test(validator-client): deflake integration test with frozen clock#24259
spalladino wants to merge 1 commit into
merge-train/spartan-v5from
spl/a-1265-deflake-validator

Conversation

@spalladino

Copy link
Copy Markdown
Contributor

Motivation

validator.integration.test.ts (suite ValidatorClient Integration) has been flaking on both next and merge-train/spartan-v5 — failed ~7 times, e.g. rejects block that would exceed checkpoint mana limit. It is a true flake: a docs-only PR ran the same command 3×, failing once and passing twice.

Root cause: the block re-execution deadline is an absolute timestamp — the slot attestation deadline (target_slot_start + S − 2E ≈ 24s for the test's constants) — compared against the date provider's clock in the public processor (now() > deadline → "Stopping tx processing due to timeout"). The suite used TestDateProvider, whose now() advances with real wall-clock from the beforeEach anchor. So the ~24s slot-1 re-execution budget started ticking before the two heavy createValidatorContext setups. On a loaded CI machine, fixture setup (~22s observed) consumed the budget before block re-execution ran; a block that should re-execute successfully instead hit the deadline, processed 0 txs, and was rejected as an empty non-first block ("Cannot add empty block that is not the first block in the checkpoint") — the same rejection the mana-limit test expects only for the overflowing block.

Approach

  • Switch the suite from TestDateProvider (drifts with real time) to the frozen ManualDateProvider, whose clock only moves on explicit setTime/advanceTime. Re-execution can no longer race wall-clock, so slow fixture setup never eats the per-slot budget.
  • The two tests that rely on a retryUntil timing out (refuses to attest if not all block proposals were processed, refuses to attest with archive mismatch) would otherwise poll the full ~24s window in real time under a frozen clock. They now advance the clock past slot 1's attestation deadline before attestToCheckpointProposal, so the (correct) timeout fires immediately.

No assertion was relaxed; the behavioral checks are unchanged.

Changes

  • validator-client (tests): use ManualDateProvider in validator.integration.test.ts; advance the frozen clock past the attestation deadline in the two timeout-dependent tests.

Verification

  • Red repro: forcing the clock past the deadline between block 1 and block 2 reproduces the exact CI failure chain (timeout → "Cannot add empty block" → validateBlockProposal returns false).
  • Green: all 8 tests pass; the two timeout-dependent tests dropped from full-window polling to ~0.8s.
  • Root-cause robustness: injecting a 30s real sleep into beforeEach (which would blow the old 24s budget) still passes with the frozen clock.
  • yarn build, yarn format, yarn lint clean.

Fixes A-1265

The re-execution deadline in validator.integration.test.ts is an absolute
timestamp (the slot attestation deadline) compared against the date
provider's clock. TestDateProvider advances with real wall-clock, so the
~24s slot-1 re-execution budget started ticking at the beforeEach clock
anchor and was consumed by the two heavy validator-context setups before
block re-execution ran. On a loaded CI machine a block that should
re-execute successfully instead hit the deadline, processed 0 txs, and was
rejected as an empty non-first block ("Cannot add empty block that is not
the first block in the checkpoint") -- the same rejection the mana-limit
test expects only for the overflowing block.

Switch the suite to the frozen ManualDateProvider so re-execution never
races real time. The two tests that rely on a retryUntil timing out now
advance the clock past the attestation deadline so the timeout fires
immediately instead of polling the full window.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant