fix(bb): size Pippenger MSM arena for the non-GLV mid-band#24249
Open
AztecBot wants to merge 1 commit into
Open
fix(bb): size Pippenger MSM arena for the non-GLV mid-band#24249AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
7a3edd3 to
00761bd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
UltraHonk
bb proveintermittently aborts in CI with:The assertion is
MsmArena::bump_alloc'sBB_ASSERT_LTE(aligned_local + bytes, bound)inscalar_multiplication_fast.cpp. It is not OOM — the per-MSM scratch arena was sized smaller than the live allocator consumes.Root cause
compute_arena_bytes_for_msmsizes the arena from the conservative full-bit schedule (NUM_BITS = FULL_NUM_BITS = 254for non-GLV). The live path (pippenger_round_parallel) then shrinks the budget toeffective_num_bits= the observed max scalar MSB, which can makechoose_window_bitspick a smallerwindow_bits⇒ more windows ⇒ a larger Zone S than the full-bit pre-size.The sizer already has a defensive sweep that takes the max arena footprint across every bit budget, but it was gated
if (use_glv || n_input >= 2^17). That left the non-GLV mid-band (GLV_SMALL_N_THRESHOLD < n_input < 2^17, i.e.8192 < n < 131072natively) sized for the full-bit schedule only.simple_shield's commitment MSM is ~28,696 points — non-GLV, squarely in that gap — so a witness whose scalars drive a smallereffective_num_bitsoverflows the arena.It's intermittent because it needs both a runner core-count that yields the heavier schedule (this run reported 8 threads) and witness data whose max scalar MSB lands in the trigger band (~224–250). Both vary per run/PR, so it isn't tied to any particular PR.
Fix
Make the defensive bit-budget sweep unconditional.
use_glv || n_input > GLV_SMALL_N_THRESHOLDis true for every MSM that reaches this point, so the sweep now covers GLV and all non-GLV sizes. It only ever grows the arena when a shrunk-bit layout genuinely needs more, so the common full-bit-dominated case stays tight. Arena sizing is scratch memory only — no effect on VKs or proof output.Test
Added
MidBandArenaSizerCoversAllEffectiveNumBits, which drives the module's own arena-fit simulator across the non-GLV mid-band × everyeffective_num_bitsat 8 threads. It fails on the pre-fix sizer (under-counts at n=28,696 foreffective_num_bits224–248, among others) and passes with the fix. FullScalarMultiplication*suite (110 tests, incl. real-MSM correctness and the small-scalar band test) passes.