feat: merge-train/spartan-v5#24256
Merged
Merged
Conversation
Fixes a race in `e2e_token_bridge_tutorial_test` where L1 setup transactions were submitted but not awaited before dependent bridge calls. - Waits for the `TestERC20.addMinter` transaction receipt before minting through the handler. - Waits for the `TokenPortal.initialize` transaction receipt before simulating `depositToAztecPublic`. - Prevents intermittent `SafeERC20FailedOperation(address token) (0x0)` when the portal deposit simulation runs before `underlying` is initialized. See http://ci.aztec-labs.com/e100f59864b9c18c for sample failed run.
…m gating from prover A node running a validator now always creates the sentinel regardless of SENTINEL_ENABLED, since the sentinel is required to reach consensus on slashing offenses. Non-validators continue to respect SENTINEL_ENABLED (default false). Subsystem gating (sentinel, slashing watchers, slasher) is no longer tied to prover-only mode but to validator status. Non-validators can opt into running the watchers plus a read-only slasher via the new OFFENSE_COLLECTION_ENABLED flag to collect and serve offenses over the admin RPC. The spartan deploy enables it by default, mirroring SENTINEL_ENABLED. Fixes A-1242
…andler for non-validator offense collection Lifts the slotsWithInvalidProposals / slotsWithProposalEquivocation tracking out of ValidatorClient and into ProposalHandler, which now implements InvalidProposalSlotSource (hasInvalidProposals / hasProposalEquivocation). The handler marks these slots from its all-nodes block and checkpoint proposal handlers, so any node that re-executes proposals (the default) populates them, not only validators. The attested-invalid-proposal watcher is now wired to the proposal handler (validator-owned or standalone) instead of the validator client, so a non-validator offense collector (OFFENSE_COLLECTION_ENABLED) can detect attested-to-invalid-checkpoint-proposal offenses. ValidatorClient delegates its has/mark calls to the handler, preserving validator behavior.
Drop the now-redundant invalid-checkpoint slot mark in ValidatorClient: the all-nodes checkpoint handler already marks the slot before invoking the failure callback. The FifoSet add was idempotent so the double-mark was harmless, but the validator-side call is dead. Guard non-validator invalid-block marking with the escape-hatch check, matching the validator path that intentionally disables invalid-block slashing while the escape hatch is open. Mark proposal equivocation from the p2p duplicate-proposal callback in ProposalHandler.register. p2p detects duplicate proposals without routing them through the proposal handlers, so without this a non-validator offense collector could mark a slot invalid without the matching equivocation mark and false-positive slash attesters. Validators overwrite this callback with their own richer handler.
…empotent The archiver preloads bundled protocol contract classes and instances at synthetic block 0. When a bundled protocol class id (or instance address) was later published on chain, the contract stores threw "already exists" while re-adding the already-present key, stalling L1 sync. Make the stores idempotent for protocol-preloaded entries: - addContractClass / addContractInstance now skip (no-op) when the key already exists and it belongs to a bundled protocol contract, keeping the existing block-0 entry untouched. Genuine non-protocol redefinitions still throw. - deleteContractClass / deleteContractInstance never delete protocol entries, so they survive reorgs of the publishing block. Adds isProtocolContractClass to protocol-contracts as a sibling of isProtocolContract, plus store unit tests and an integration test that publishes a bundled protocol class id through ArchiverDataStoreUpdater. Fixes A-1257
…nt state The archive tree is an append-only accumulator of block-header hashes, so a single bad leaf (e.g. from a mishandled reorg) is never self-corrected: every later root stays noncanonical while the other state trees can re-converge from block effects. sync_block only checked the four non-archive trees, so a self-consistent orphan block — commonly an empty one — could silently fork the archive root from canonical. Verify the archive root against canonical both before appending (the committed root must equal the block's lastArchive) and after (the resulting root must equal the block's archive), failing before commit so the divergence is never persisted. The checks are optional std::optional parameters; the napi and wsdb transports forward the canonical roots from the block being synced.
…nt state (#24229) Fixes A-1235 # A-1235 reviewer notes: archive-root divergence and the v4/v5 block-stream race ## Short version The A-1235 symptom was a mainnet fisherman rejecting every proposal with `ReExInitialStateMismatchError`. The proposal/canonical `lastArchive.root` was correct; the local world-state archive root was wrong. The important distinction for review is: - v4.3.0 appears vulnerable to a specific block-stream/cache race that can seed this state. - v5 has already hardened that specific race. - v5 still lacks the world-state archive-root invariant, so if any other path ever writes a bad archive leaf, the corruption can still be committed silently. This change is therefore defense in depth at the only layer that can conclusively reject a bad archive tree before it is persisted: `WorldState::sync_block`. ## What went wrong World state maintains the archive tree as an append-only accumulator of block-header hashes. During `sync_block`, v4.3.0 appended the new block header hash and then checked: - `is_archive_tip`: the just-appended header hash is the tip leaf. - `is_same_state_reference`: nullifier, note hash, public data, and L1-to-L2 message trees match the block state reference. Neither check verifies the archive root against canonical block data: - before append: local archive root must equal the block's `lastArchive.root`; - after append: computed archive root must equal the block's `archive.root`. That means an orphan block can be internally self-consistent and still poison the archive tree. If the orphan and canonical block at the same height have the same effects, commonly because both are empty, the four non-archive trees can reconverge while the archive tree remains permanently forked. The resulting sequence is: 1. World state syncs an orphan block at height `F`. 2. The archiver later canonicalizes a different block at height `F`. 3. The block stream should prune world state to `F - 1`. 4. In the suspected v4 race, it prunes only to `F`, so the orphan archive leaf remains. 5. World state syncs canonical descendants starting at `F + 1` on top of the orphan leaf. 6. The four-tree state-reference check passes, but every archive root from `F` onward is off-chain. ## The v4 block-stream/cache race The original explainer says roughly: `getL2Tips` had processed the reorg, while the archiver had not yet swapped orphan `F` for canonical `F`. More precisely, this is not about committed DB state being partially updated. The block/checkpoint mutation is one LMDB writer transaction. The race is that v4's `L2TipsCache` can publish a tip computed from the uncommitted writer view, while separate block/header reads still use committed read transactions. In v4.3.0: - `ArchiverDataStoreUpdater.addCheckpoints()` wraps prune, checkpoint insertion, logs, contract data, and `l2TipsCache.refresh()` in one `store.transactionAsync(...)`. - `L2TipsCache.refresh()` assigns `#tipsPromise = this.loadFromStore()` before the writer transaction commits. - The LMDB wrapper reuses the active write transaction for reads inside that callback, so `loadFromStore()` can see the post-reorg uncommitted view. - A concurrent consumer calling `archiver.getL2Tips()` can receive that cached future/post-reorg tip. - The same consumer's later `getBlockHeader(F)` call is outside the writer context, so it opens a normal committed read transaction and can still see the pre-reorg orphan at `F`. The v4 `L2BlockStream` performs exactly this mixed read pattern: ```ts const sourceTips = await this.l2BlockSource.getL2Tips(); const localTips = await this.localData.getL2Tips(); let latestBlockNumber = localTips.proposed.number; const sourceCache = new BlockHashCache([sourceTips.proposed]); while (!(await this.areBlockHashesEqualAt(latestBlockNumber, { sourceCache }))) { latestBlockNumber--; } ``` `areBlockHashesEqualAt()` then asks the source for a per-height hash via `getBlockHeader(blockNumber)`. So a single pass can be planned from a future/post-reorg tip but compare old committed per-height headers. If both local world state and the committed archiver still have the orphan at `F`, the walk can falsely conclude `F` is the common ancestor. Pruning to `F` keeps block `F`; the correct target was `F - 1`. That is the suspected seed path for A-1235. The live evidence proves the archive tree forked; the historical interleaving predates retained logs, so treat this as the best source-grounded mechanism, not as directly logged fact. ## What v5 changes in this area v5 has multiple changes that make the specific v4 race much less plausible. ### 1. Tip cache refresh is post-commit In v5, `L2TipsCache` says refresh should happen after the writer transaction has committed, and `ArchiverDataStoreUpdater` does exactly that: ```ts const result = await this.stores.db.transactionAsync(async () => { // mutate blocks/checkpoints/logs/etc. return ...; }); await this.l2TipsCache?.refresh(); return result; ``` So the cache is loaded from committed DB state, and an aborted writer cannot replace the cache with a future view. ### 2. `getL2TipsData()` is one DB snapshot v5 moves chain-tip construction into `BlockStore.getL2TipsData(genesisBlockHash)` and wraps it in a single `db.transactionAsync(...)`. It also validates the resulting tier ordering: - finalized <= proven <= checkpointed <= proposed; - checkpointed block <= proposed block. That is materially stronger than v4's `L2TipsCache.loadFromStore()`, which assembled tip numbers and block data through several independent store calls. ### 3. The block stream fails closed on incoherent source reads v5's `L2BlockStream` no longer uses the old checkpoint-prefetch path in the same way. It drives from a source tips snapshot, compares per-height hashes via `getBlockData({ number })`, and adds guards: - missing local hash compares unequal rather than accidentally stopping the walk; - missing source data at or below the advertised source proposed tip aborts the pass; - source tips are re-read after a prune before downstream reconciliation; - the download pass verifies the delivered proposed block hash matches the snapshot's proposed hash; - tier advancement is skipped if the block download plan did not complete. These are all aimed at preventing a stale or mixed source snapshot from becoming an under-deep prune. Over-deep or skipped reconciliation is recoverable; under-deep pruning is dangerous because it can leave the losing fork's block at the divergence height. ## Why this change still matters on v5 The v5 block stream fixes the known/suspected seed path, but it does not add the missing invariant to world state. Without this patch, `sync_block` can still commit a divergent archive tree if any future path presents it with a self-consistent block whose non-archive state matches: - a different reorg bug; - a cache/snapshot bug elsewhere; - operator/datadir corruption; - a future refactor that bypasses one of the v5 block-stream protections. Once a bad archive leaf is committed, appending more canonical leaves does not repair it. The archive root remains noncanonical forever while the four non-archive trees can look healthy. The fix makes `sync_block` verify the archive tree at the source of truth: ```cpp // Before append: committed local root must be the block's parent archive root. actual_previous_archive_root == expected_previous_archive_root // After append, before commit: uncommitted computed root must be the block's archive root. actual_archive_root == expected_archive_root ``` These checks turn silent permanent corruption into a loud sync failure before commit. The node may need a datadir resync, but it will not write and seal in the divergent archive leaf. ## Reviewer checklist - Confirm the TS/native message path passes both canonical roots: - `expectedPreviousArchiveRoot = l2Block.header.lastArchive.root` - `expectedArchiveRoot = l2Block.archive.root` - Confirm native `sync_block` checks the previous root before `add_value`. - Confirm native `sync_block` checks the resulting root before `commit`. - Confirm errors clearly tell operators that local world state diverged and must be resynced. - Confirm tests cover the archive-only divergence case, not just non-archive state-reference mismatch. ## Expected operator behavior This patch prevents recurrence; it does not repair an already-poisoned archive tree. A node that already has the bad historical leaf must wipe/resync its world-state datadir.
…24202) ## Motivation The sentinel is required for the network to reach consensus on slashing offenses — it's a core duty of participating in consensus, not an optional add-on. Yet `SENTINEL_ENABLED` defaulted to `false`, so a validator could run without one and silently degrade the network's ability to enforce its own rules. While fixing that, two adjacent issues surfaced: - Whether the node runs the sentinel and the slashing-detection watchers was gated on a *prover-only* flag (`enableProverNode && disableValidator`). That coupling is wrong: those subsystems should depend on **validator status**, not on whether the prover is enabled. - A non-validator (e.g. an RPC or full node) had no way to collect and inspect offenses, because the only component that persists them lives inside the slasher client, which was created only for validators. ## Approach - **Force the sentinel for validators.** `createSentinel` now self-gates: if the node runs a validator it always creates the sentinel (ignoring `SENTINEL_ENABLED`); otherwise it respects `SENTINEL_ENABLED` (default `false`). It logs which case applies so operators understand why a `SENTINEL_ENABLED=false` is being overridden. - **Decouple subsystem gating from the prover.** Removed the `proverOnly` guard. The slashing-detection watchers and the slasher are now gated on `collectOffenses = !disableValidator || enableOffenseCollection`. - **Opt-in offense collection for non-validators.** A new `OFFENSE_COLLECTION_ENABLED` flag lets a non-validator run the watchers plus a read-only slasher client (it never writes to L1 — voting/execution flow through the sequencer publisher, which a non-validator lacks) to collect and serve offenses over the existing `getSlashOffenses` admin RPC. The spartan deploy enables it by default, mirroring `SENTINEL_ENABLED`. - **Make the attested-invalid-proposal watcher work without a validator.** Lifted the invalid-proposal / equivocation slot tracking out of `ValidatorClient` into `ProposalHandler`, which now implements `InvalidProposalSlotSource`. The handler populates this from its all-nodes proposal handlers, so any node that re-executes proposals (the default) can feed the watcher. ## API changes - New node config `enableOffenseCollection` (env `OFFENSE_COLLECTION_ENABLED`, default `false`). Named distinctly from the existing L1-deploy `AZTEC_SLASHER_ENABLED` to avoid confusion. - Validator nodes now always run the sentinel regardless of `SENTINEL_ENABLED`. ## Changes - **aztec-node**: `createSentinel` forces the sentinel for validators and logs the reason; `server.ts` removes `proverOnly`, gates watchers + a split-out read-only slasher on `collectOffenses`, and wires the attested-invalid-proposal watcher to the proposal handler. - **aztec-node (config) / foundation**: new `enableOffenseCollection` / `OFFENSE_COLLECTION_ENABLED` flag and env-var entry. - **validator-client**: `ProposalHandler` now owns the invalid-proposal / equivocation slot tracking and implements `InvalidProposalSlotSource`; `ValidatorClient` delegates to it, preserving validator behavior. - **spartan**: `OFFENSE_COLLECTION_ENABLED` wired through the chart value, pod template, network defaults, terraform variable, and deploy script, defaulting on (mirrors `SENTINEL_ENABLED`). Fixes A-1242
…empotent (#24227) ## Motivation The archiver preloads every bundled protocol contract class (and its canonical instance) into its local store at synthetic block 0, before L1 sync. World-state genesis, however, seeds no registration nullifiers for those classes/instances. As a result a *first* on-chain `ContractClassRegistry.publish` of a bundled protocol class id is protocol-valid (fresh class-id nullifier + `ContractClassPublished` log). On replay the archiver recomputes the same class id and unconditionally re-inserts it, but the store throws on the pre-existing block-0 key (`Contract class <id> already exists, cannot add again`). Because that insert runs inside the block/checkpoint store transaction, the throw aborts persistence, and L1 sync retries the same valid checkpoint indefinitely — a sync stall. ## Approach Make the archiver treat protocol-preloaded entries as idempotent and immutable, guarded at the store layer (the single chokepoint for both the add-throw and the block-gated delete): - `addContractClass` / `addContractInstance`: when the key already exists and it is a protocol class id / magic protocol address, treat the (re-)publish as a no-op and keep the existing block-0 entry — crucially **without** bumping its recorded block number. Genuine non-protocol duplicates still throw unchanged. - `deleteContractClass` / `deleteContractInstance`: skip deletion for protocol entries, so a reorg of the publishing block can never roll out the preload. Keeping the preload at block 0 is deliberate: `deleteContractClass` only deletes when the stored `l2BlockNumber >= blockNumber`, so block 0 makes protocol entries survive any reorg; bumping to the publish block would let a deep reorg delete them. The instance-side guards are defensive only: on-chain `publish_for_public_execution` always emits the *derived* address, never a magic address, so the instance store is not reached with a magic address via the on-chain replay path today — the guards exist for symmetry and to protect future code paths. This is a non-breaking, node-local resilience fix. A follow-up PR (PR2) seeds the protocol registration nullifiers into world-state genesis so the on-chain re-publish is rejected at the protocol level — the root-cause fix for the class path. ## Changes - **protocol-contracts**: add `isProtocolContractClass(classId)` (sibling of the existing `isProtocolContract(address)`), backed by a set of the generated `ProtocolContractClassId` values. - **archiver**: idempotent add + protected delete for protocol classes and instances in `ContractClassStore` / `ContractInstanceStore`. - **archiver (tests)**: store unit tests (idempotent re-add stays queryable with block-0 / bytecode-commitment preserved, protected delete, non-protocol duplicate still throws — for both classes and instances) and an A-1257 integration test that preloads protocol contracts, builds an `L2Block` carrying a `ContractClassPublished` log for a bundled class id, and asserts `addProposedBlock` commits and the class stays queryable. Fixes A-1257
…24065) Fixes a race in `e2e_token_bridge_tutorial_test` where L1 setup transactions were submitted but not awaited before dependent bridge calls. - Waits for the `TestERC20.addMinter` transaction receipt before minting through the handler. - Waits for the `TokenPortal.initialize` transaction receipt before simulating `depositToAztecPublic`. - Prevents intermittent `SafeERC20FailedOperation(address token) (0x0)` when the portal deposit simulation runs before `underlying` is initialized. See http://ci.aztec-labs.com/e100f59864b9c18c for sample failed run.
## Problem The nightly `block-capacity-benchmark` job has been failing fast (~5 min) with every transaction rejected: ``` Invalid tx: Insufficient fee payer balance (required=507345496560000, available=0) ``` All 8 benchmark cases fail on the first `send`. The build succeeds and the network is reachable — this is purely a genesis-funding problem. ## Root cause `block_capacity.test.ts` pays **all** fees through `SponsoredFeePaymentMethod` (account deploys, mints, transfers). The fee payer is therefore the sponsored FPC, not the worker wallets. But `spartan/environments/block-capacity.env` never set `SPONSORED_FPC=true`, so the network was deployed **without funding the sponsored FPC at genesis** — its fee-juice balance is `available=0`, so no tx can be paid for. Every other bench env already sets this flag (`tps-scenario`, `bench-10tps`, `prove-n-tps-fake`, `prove-n-tps-real`, `next-scenario`); `block-capacity.env` was the only one missing it. `network-defaults.yml` documents the flag as *"Fund sponsored FPC with fee juice"*. ## Fix Add `SPONSORED_FPC=true` to `block-capacity.env`. Found while investigating the failures in nightly Spartan Benchmarks run #95.
Collaborator
Author
|
🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BEGIN_COMMIT_OVERRIDE
fix(world-state): verify archive root in sync_block to reject divergent state (#24229)
feat(aztec-node): force sentinel for validators + offense collection (#24202)
fix(archiver): treat re-publish of preloaded protocol contracts as idempotent (#24227)
test(e2e): wait for L1 txs to be mined in token bridge tutorial test (#24065)
fix(spartan): fund sponsored FPC in block-capacity bench env (#24248)
END_COMMIT_OVERRIDE