Skip to content

fix(ci): give AVM check-circuit more CPU/time for heavy txs (canonical)#24234

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-cc-timeout-fix-24198
Draft

fix(ci): give AVM check-circuit more CPU/time for heavy txs (canonical)#24234
AztecBot wants to merge 1 commit into
nextfrom
cb/avm-cc-timeout-fix-24198

Conversation

@AztecBot

@AztecBot AztecBot commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What

Raise the per-command resource budget for avm_check_circuit in yarn-project/end-to-end/bootstrap.sh from CPUS=2 (default) / TIMEOUT=30s to CPUS=4 / TIMEOUT=120s.

Why

The "AVM Circuit Inputs Collection and Check" workflow (avm-check-circuit job) has been failing repeatedly on next. The GitHub job exits 124, which propagates up from a single check-circuit invocation that blows the per-tx timeout while every other input passes in 4–11s. This is a clean wall-clock timeout (exit 124) — not a circuit/assertion error and not OOM (peak ~3.9 GiB inside an 8 GiB container; dmesg showed no kill).

avm_check_circuit_cmds fans out up to 96 bb-avm avm_check_circuit jobs in parallel via parallelize. The heaviest e2e txs build large AVM circuits (~700k rows); trace generation + check-circuit for them does not fit in the long-standing 30s/2-CPU budget (in place since #18747), especially under that parallel CPU contention. The existing code comment already anticipated exactly this ("transactions could need more CPU and MEM than we allocate by default … they might start timing out"). These failures are unrelated to whatever commit happens to be at the head of the failing run — avm_check_circuit is standalone bb-avm reading a dumped .bin.

Observed timeouts span multiple heavy txs, confirming this is not a single-tx fluke:

Fix

Both trace generation and check-circuit are multithreaded, so the bottleneck is the 2-CPU cap as much as the 30s clock. Bump to CPUS=4 (which also raises the derived MEM to 16 GiB via MEM=CPUS*4 g) and TIMEOUT=120s for generous headroom on the heaviest txs and on a loaded runner. One-line prefix update plus a refreshed comment. No code/circuit behavior changes; this only adjusts CI execution resources.

Validation note

avm-circuit-inputs.yml triggers only on push to next, the nightly cron, and workflow_dispatch — it does not run on pull requests. So this change cannot be validated by PR CI; it takes effect (and is validated by the next push/nightly run) once merged to next, or via a manual workflow_dispatch on this branch.

Supersedes

This is the canonical consolidation of the duplicate timeout-bump PRs opened by successive failure auto-dispatches. It strictly dominates them (more CPU and a larger global timeout, covering every heavy tx — not just e2e_multiple_blobs):

Closing those four in favor of this PR.

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jun 23, 2026
@AztecBot AztecBot changed the title fix(ci): give AVM check-circuit more CPU/time for heavy txs fix(ci): give AVM check-circuit more CPU/time for heavy txs (canonical) Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant