Skip to content

test: slow-test ratchet and speed rules from measured experiments#1099

Merged
thymikee merged 2 commits into
mainfrom
test/fast-tests-plan
Jul 4, 2026
Merged

test: slow-test ratchet and speed rules from measured experiments#1099
thymikee merged 2 commits into
mainfrom
test/fast-tests-plan

Conversation

@thymikee

@thymikee thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member

Companion to #1097's test-topology rules: makes the test-speed strategy experiment-backed and enforced. All numbers measured on this machine, full unit suite (340 files / 3,210 tests).

Findings

Experiment Result Verdict
Baseline 48.4s wall; aggregate import 99s, tests 331s wall ≈ slowest file (44.6s android monolith) — file-granularity Amdahl
Per-test durations 10.8s "times out" test waits out the real 10s constant; 8s emulator polls at 1Hz; real retry backoffs the enemy is real-time waits, not assertion count
--no-isolate 205s wall, tests 1257s rejected — module state thrashes across files; documented so nobody cargo-cults it
--pool=threads 50.4s rejected — no effect
Budget-derived poll cadence in waitForAndroidEmulatorByAvdName devices.test.ts 25.6s → 2.8s (9×) isolated the exemplar conversion; also a production improvement (short budgets stop being sampled at 1Hz)

Changes

  • Slow-test ratchet (scripts/vitest-slow-test-reporter.ts): unit budget 2.5s / integration 15s, enforcement at 2× budget with a report-only band between — a wall-clock gate that flakes under host load trains people to ignore it, so variance is absorbed by design. 36 pinned offenders (exact keys), pin shrinks-only per the gap-list convention, tracking issue Slow-test ratchet: convert real-time waits to injected budgets #1098 groups them by conversion pattern. Canary-verified: fires with exit 1 on an unpinned sleeper, passes clean otherwise.
  • Guidance (docs/agents/testing.md "Speed rules" + AGENTS.md bullet): conversion patterns in preference order (budget-derived cadence → budget-wiring assertion → fake clocks), the no-test-only-seams constraint, and the isolation/pool decisions recorded with their measurements so the next person doesn't re-run the failed experiments.
  • The monolith test split (chipped separately) is hereby promoted from navigation nicety to wall-clock fix — the 44.6s file IS the suite's critical path.

The 10.8s worst offender stays pinned deliberately: its conversion (budget-wiring assertion) requires mocking the tool layer, which the apple monolith test file can't do cleanly until the topology split — evidence for doing that split soon.

… from experiments

Measured (2026-07-04, full unit suite: 340 files / 3,210 tests / 48s wall):
wall clock was bounded by the slowest FILE (44.6s android monolith at ~7x
file-level parallelism), and the slowest tests were sleeping through real
production budgets (10.8s proving 'times out' by waiting the constant out,
8s emulator polls at 1Hz, real retry backoff). Two config experiments
rejected with data: --no-isolate exploded the suite to 205s (module state
thrashes across files sharing workers) and --pool=threads changed nothing.

- scripts/vitest-slow-test-reporter.ts: the slow-test ratchet. Unit budget
  2.5s / integration 15s; failure at 2x budget (the band between reports
  without failing so host-load variance cannot make the gate cry wolf);
  36 pinned offenders, exact keys, ratchet-only pin (tracking #1098).
- waitForAndroidEmulatorByAvdName: poll cadence derives from the caller's
  budget (min 1s, floor 50ms, ~timeout/20) — devices.test.ts 25.6s -> 2.8s
  (9x) in isolation, and short-budget production calls stop sampling at
  1Hz against small budgets.
- vitest.config: slowTestThreshold 500 for local visibility; reporter
  wired; isolation/pool decisions documented with the measurements.
- docs/agents/testing.md 'Speed rules' + AGENTS.md testing bullet: the
  three conversion patterns in preference order (budget-derived cadence,
  budget-wiring assertion, fake clocks), the no-seam constraint, and the
  file-granularity Amdahl argument that makes the monolith test split a
  wall-clock fix, not just navigation.
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.5 MB 1.5 MB +54 B
JS gzip 489.8 kB 489.8 kB +19 B
npm tarball 588.9 kB 588.9 kB +17 B
npm unpacked 2.1 MB 2.1 MB +54 B

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 22.0 ms 22.6 ms +0.7 ms
CLI --help 40.9 ms 43.6 ms +2.8 ms

Top changed chunks:

Chunk Raw diff Gzip diff
dist/src/logcat.js +54 B +19 B

@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

CI is blocked on Fallow Code Quality. The failed job reports two actionable items on this branch:

  • scripts/vitest-slow-test-reporter.ts is currently an unused file, so the slow-test reporter is not reachable from any configured entry point. Please wire it into the intended Vitest config/script path or otherwise make the intended runtime use visible to Fallow.
  • onTestCaseResult in that reporter is over the complexity threshold. Please split the decision path enough for Fallow to pass instead of suppressing the gate.

All other visible checks are green, so this looks like the current blocker before review/ready-for-human.

…orter, unit tests

The string-path reporter wiring read as a dead file (fallow cannot see
vitest's reporter loading); the config now imports the factory, making
the edge real and type-checked. The class shape tripped the
unused-class-members rule (framework callbacks are invisible to
reference analysis) — converted to a factory returning the Reporter
object, with the classification and rendering logic extracted as pure
exported functions. Those functions now carry their own unit tests
(budget bands, integration budgets, pin matching, warn-vs-fail
rendering), which also grounds the CRAP estimate in real references.
Canary re-verified: unpinned 5.2s sleeper fails the run with exit 1;
clean runs exit 0.
@thymikee thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jul 4, 2026
@thymikee

thymikee commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

Review status: no actionable blockers found.

I rechecked the latest head after the Fallow fix. The slow-test reporter is now wired through vitest.config.ts, has focused tests for offender classification/reporting, and all GitHub checks are green. The Android emulator wait change is a bounded production improvement: poll cadence derives from the caller timeout with a 50ms floor and 1s cap, so short budgets no longer spend most of their time between samples while long boots keep the previous gentle cadence. Docs/AGENTS guidance matches the gate behavior and #1098 is the right tracking issue for shrinking the pinned list.

Added ready-for-human.

@thymikee thymikee merged commit 2557670 into main Jul 4, 2026
20 checks passed
@thymikee thymikee deleted the test/fast-tests-plan branch July 4, 2026 17:13
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-07-04 17:13 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-human Valid work that needs human implementation, judgment, or maintainer merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant