test: slow-test ratchet and speed rules from measured experiments by thymikee · Pull Request #1099 · callstack/agent-device

thymikee · 2026-07-04T16:15:19Z

Companion to #1097's test-topology rules: makes the test-speed strategy experiment-backed and enforced. All numbers measured on this machine, full unit suite (340 files / 3,210 tests).

Findings

Experiment	Result	Verdict
Baseline	48.4s wall; aggregate import 99s, tests 331s	wall ≈ slowest file (44.6s android monolith) — file-granularity Amdahl
Per-test durations	10.8s "times out" test waits out the real 10s constant; 8s emulator polls at 1Hz; real retry backoffs	the enemy is real-time waits, not assertion count
`--no-isolate`	205s wall, tests 1257s	rejected — module state thrashes across files; documented so nobody cargo-cults it
`--pool=threads`	50.4s	rejected — no effect
Budget-derived poll cadence in `waitForAndroidEmulatorByAvdName`	`devices.test.ts` 25.6s → 2.8s (9×) isolated	the exemplar conversion; also a production improvement (short budgets stop being sampled at 1Hz)

Changes

Slow-test ratchet (scripts/vitest-slow-test-reporter.ts): unit budget 2.5s / integration 15s, enforcement at 2× budget with a report-only band between — a wall-clock gate that flakes under host load trains people to ignore it, so variance is absorbed by design. 36 pinned offenders (exact keys), pin shrinks-only per the gap-list convention, tracking issue Slow-test ratchet: convert real-time waits to injected budgets #1098 groups them by conversion pattern. Canary-verified: fires with exit 1 on an unpinned sleeper, passes clean otherwise.
Guidance (docs/agents/testing.md "Speed rules" + AGENTS.md bullet): conversion patterns in preference order (budget-derived cadence → budget-wiring assertion → fake clocks), the no-test-only-seams constraint, and the isolation/pool decisions recorded with their measurements so the next person doesn't re-run the failed experiments.
The monolith test split (chipped separately) is hereby promoted from navigation nicety to wall-clock fix — the 44.6s file IS the suite's critical path.

The 10.8s worst offender stays pinned deliberately: its conversion (budget-wiring assertion) requires mocking the tool layer, which the apple monolith test file can't do cleanly until the topology split — evidence for doing that split soon.

… from experiments Measured (2026-07-04, full unit suite: 340 files / 3,210 tests / 48s wall): wall clock was bounded by the slowest FILE (44.6s android monolith at ~7x file-level parallelism), and the slowest tests were sleeping through real production budgets (10.8s proving 'times out' by waiting the constant out, 8s emulator polls at 1Hz, real retry backoff). Two config experiments rejected with data: --no-isolate exploded the suite to 205s (module state thrashes across files sharing workers) and --pool=threads changed nothing. - scripts/vitest-slow-test-reporter.ts: the slow-test ratchet. Unit budget 2.5s / integration 15s; failure at 2x budget (the band between reports without failing so host-load variance cannot make the gate cry wolf); 36 pinned offenders, exact keys, ratchet-only pin (tracking #1098). - waitForAndroidEmulatorByAvdName: poll cadence derives from the caller's budget (min 1s, floor 50ms, ~timeout/20) — devices.test.ts 25.6s -> 2.8s (9x) in isolation, and short-budget production calls stop sampling at 1Hz against small budgets. - vitest.config: slowTestThreshold 500 for local visibility; reporter wired; isolation/pool decisions documented with the measurements. - docs/agents/testing.md 'Speed rules' + AGENTS.md testing bullet: the three conversion patterns in preference order (budget-derived cadence, budget-wiring assertion, fake clocks), the no-seam constraint, and the file-granularity Amdahl argument that makes the monolith test split a wall-clock fix, not just navigation.

github-actions · 2026-07-04T16:15:50Z

Size Report

Metric	Base	Current	Diff
JS raw	1.5 MB	1.5 MB	+54 B
JS gzip	489.8 kB	489.8 kB	+19 B
npm tarball	588.9 kB	588.9 kB	+17 B
npm unpacked	2.1 MB	2.1 MB	+54 B

Startup median (7 runs, lower is better):

Scenario	Base	Current	Diff
CLI --version	22.0 ms	22.6 ms	+0.7 ms
CLI --help	40.9 ms	43.6 ms	+2.8 ms

Top changed chunks:

Chunk	Raw diff	Gzip diff
`dist/src/logcat.js`	+54 B	+19 B

thymikee · 2026-07-04T16:26:33Z

CI is blocked on Fallow Code Quality. The failed job reports two actionable items on this branch:

scripts/vitest-slow-test-reporter.ts is currently an unused file, so the slow-test reporter is not reachable from any configured entry point. Please wire it into the intended Vitest config/script path or otherwise make the intended runtime use visible to Fallow.
onTestCaseResult in that reporter is over the complexity threshold. Please split the decision path enough for Fallow to pass instead of suppressing the gate.

All other visible checks are green, so this looks like the current blocker before review/ready-for-human.

…orter, unit tests The string-path reporter wiring read as a dead file (fallow cannot see vitest's reporter loading); the config now imports the factory, making the edge real and type-checked. The class shape tripped the unused-class-members rule (framework callbacks are invisible to reference analysis) — converted to a factory returning the Reporter object, with the classification and rendering logic extracted as pure exported functions. Those functions now carry their own unit tests (budget bands, integration budgets, pin matching, warn-vs-fail rendering), which also grounds the CRAP estimate in real references. Canary re-verified: unpinned 5.2s sleeper fails the run with exit 1; clean runs exit 0.

thymikee · 2026-07-04T16:47:51Z

Review status: no actionable blockers found.

I rechecked the latest head after the Fallow fix. The slow-test reporter is now wired through vitest.config.ts, has focused tests for offender classification/reporting, and all GitHub checks are green. The Android emulator wait change is a bounded production improvement: poll cadence derives from the caller timeout with a 50ms floor and 1s cap, so short budgets no longer spend most of their time between samples while long boots keep the previous gentle cadence. Docs/AGENTS guidance matches the gate behavior and #1098 is the right tracking issue for shrinking the pinned list.

Added ready-for-human.

github-actions · 2026-07-04T17:13:26Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-07-04 17:13 UTC

thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jul 4, 2026

thymikee merged commit 2557670 into main Jul 4, 2026
20 checks passed

thymikee deleted the test/fast-tests-plan branch July 4, 2026 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: slow-test ratchet and speed rules from measured experiments#1099

test: slow-test ratchet and speed rules from measured experiments#1099
thymikee merged 2 commits into
mainfrom
test/fast-tests-plan

thymikee commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026 •

edited

Loading

Uh oh!

thymikee commented Jul 4, 2026 •

edited

Loading

Uh oh!

thymikee commented Jul 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thymikee commented Jul 4, 2026

Findings

Changes

Uh oh!

github-actions Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Size Report

Uh oh!

thymikee commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thymikee commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jul 4, 2026 •

edited

Loading

thymikee commented Jul 4, 2026 •

edited

Loading

thymikee commented Jul 4, 2026 •

edited

Loading