Skip to content

feat(ci3): run uploadable benchmarks on a dedicated on-demand instance (v5-next port)#24255

Open
charlielye wants to merge 1 commit into
v5-nextfrom
ci3-dedicated-bench-v5
Open

feat(ci3): run uploadable benchmarks on a dedicated on-demand instance (v5-next port)#24255
charlielye wants to merge 1 commit into
v5-nextfrom
ci3-dedicated-bench-v5

Conversation

@charlielye

Copy link
Copy Markdown
Contributor

Ports #24028 (merged to next as 19de9f1551) to the v5 release line.

Clean cherry-pick of the squashed next commit — no conflicts. Brings the dedicated-bench work to v5-next:

  • Uploadable benchmark runs execute on a dedicated, fixed-type on-demand instance for stable numbers, decoupled from the now-variable (spot-diversified) build hardware.
  • Single BENCH_UPLOAD flag drives both the dedicated-box launch and the GA publish gate; grind runs (merge-queue-heavy) fire exactly one dedicated box (first instance) while the rest bench inline as a breakage check — de-racing the shared bench-<treehash> upload key.
  • First-class make bench target (incl. bb-acir so the bb browser-memory bench has its headless-test harness).
  • bench_engine keeps HT-off + bottom-half scheduling (restored after the dedicated-box change), so timing-sensitive benches don't suffer sibling-thread interference.
  • bench/next vs bench/prs destinations preserved.

See #24028 for full review history.

> [!IMPORTANT]
> Depends on the IAM change aztec-labs-eng/iac#6 (grants `ci3-build-instance-role` the launch/SSM/PassRole surface). **That must apply first**, else the build instance's `create-fleet` hits `UnauthorizedOperation`.

## Problem

Spot diversification (create-fleet) means build instances now land on variable EC2 types — m6a/m7a/m6i/r6a/r7a at 16/32/48xlarge, AMD vs Intel. The in-build benchmark phase runs on that box, so wall-time numbers vary by hardware family far more than the 105% regression alert threshold → false regressions. (The instance type isn't even recorded in the bench JSON.)

## Approach

Only the canonical **merge-queue→next** series (the one used for real regression tracking) runs benches on a **dedicated, fixed, on-demand m6a.16xlarge**. PR `ci-full` runs keep running benches inline on the contended build box purely as a **breakage check** — no dedicated box, no upload.

Benches are scheduled by the existing test engine: when the build completes in `build_and_test` (full builds only),
- **upload runs** (`SHOULD_UPLOAD_BENCHMARKS=1`): launch the dedicated box via `./ci.sh bench` as a backgrounded, colored, denoised job (logged like the test engine) and `wait` on it (non-fatal) before returning;
- **otherwise**: `bench_cmds >> $test_cmds_file` — benches become ordinary test commands.

`ci.sh bench` → `bootstrap_ec2` blocks until the remote `ci-bench` finishes (ending in `cache_upload bench-<treehash>`), so the `wait` is the whole rendezvous. Results reach the GA `Upload benchmarks` step unchanged via that cache key (`ci3_success.sh` `gh-bench`).

## Changes

- **`bootstrap.sh`**: drop inline `bench` from `ci-full`/`ci-full-no-test-cache`; add the `build_and_test` launch/append hook + non-fatal `wait`; new `ci-bench` mode = cache-hit `make full` + `bench` (no test engine).
- **`ci.sh`**: new `bench` launcher — `AWS_INSTANCE=m6a.16xlarge NO_SPOT=1` (pins a fixed on-demand type; `CPUS` not needed since `AWS_INSTANCE` bypasses pool sizing).
- **`ci3/bench_engine`**: drop the 8-core OS isolation / HT-disable / pinning. Dedicated box → benches use the full machine, honouring per-bench `CPUS` via the strict scheduler (defaults to `nproc/2` without `BENCH_CPU_COUNT`). This is what lets the 64-vCPU 16xlarge satisfy the `CPUS=32` bb rollup bench.
- **`.github/ci3_labels_to_env.sh`**: scope `SHOULD_UPLOAD_BENCHMARKS` to merge-queue→next (it now also gates the dedicated box). **`ci3/bootstrap_ec2`**: pass it through to the instance.

## Notes
- **One-time baseline shift** in `bench/next`: different machine + no isolation changes absolute numbers once; stable thereafter. May want to annotate the series.
- **Soft failure**: a bench-box failure is logged and the run proceeds (no fresh numbers) rather than blocking the merge.
- **PR benches-as-tests**: `:PARALLEL=0` serial benches lose one-at-a-time isolation and run contended — fine for breakage-only; real numbers come from the dedicated box's `bench_engine` path.
- Validated: all touched scripts pass `bash -n`; the `AWS_INSTANCE`+`NO_SPOT` fixed-on-demand launch mechanism was verified live during the create-fleet work. Full e2e is exercised by a merge-queue→next run once the iac PR lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant