Skip to content

feat: merge-train/spartan-v5#24256

Merged
AztecBot merged 13 commits into
v5-nextfrom
merge-train/spartan-v5
Jun 24, 2026
Merged

feat: merge-train/spartan-v5#24256
AztecBot merged 13 commits into
v5-nextfrom
merge-train/spartan-v5

Conversation

@AztecBot

@AztecBot AztecBot commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

BEGIN_COMMIT_OVERRIDE
fix(world-state): verify archive root in sync_block to reject divergent state (#24229)
feat(aztec-node): force sentinel for validators + offense collection (#24202)
fix(archiver): treat re-publish of preloaded protocol contracts as idempotent (#24227)
test(e2e): wait for L1 txs to be mined in token bridge tutorial test (#24065)
fix(spartan): fund sponsored FPC in block-capacity bench env (#24248)
END_COMMIT_OVERRIDE

spalladino and others added 13 commits June 12, 2026 17:03
Fixes a race in `e2e_token_bridge_tutorial_test` where L1 setup transactions were submitted but not awaited before dependent bridge calls.

- Waits for the `TestERC20.addMinter` transaction receipt before minting through the handler.
- Waits for the `TokenPortal.initialize` transaction receipt before simulating `depositToAztecPublic`.
- Prevents intermittent `SafeERC20FailedOperation(address token) (0x0)` when the portal deposit simulation runs before `underlying` is initialized.

See http://ci.aztec-labs.com/e100f59864b9c18c for sample failed run.
…m gating from prover

A node running a validator now always creates the sentinel regardless of SENTINEL_ENABLED, since the sentinel is required to reach consensus on slashing offenses. Non-validators continue to respect SENTINEL_ENABLED (default false).

Subsystem gating (sentinel, slashing watchers, slasher) is no longer tied to prover-only mode but to validator status. Non-validators can opt into running the watchers plus a read-only slasher via the new OFFENSE_COLLECTION_ENABLED flag to collect and serve offenses over the admin RPC. The spartan deploy enables it by default, mirroring SENTINEL_ENABLED.

Fixes A-1242
…andler for non-validator offense collection

Lifts the slotsWithInvalidProposals / slotsWithProposalEquivocation tracking out of ValidatorClient and into ProposalHandler, which now implements InvalidProposalSlotSource (hasInvalidProposals / hasProposalEquivocation). The handler marks these slots from its all-nodes block and checkpoint proposal handlers, so any node that re-executes proposals (the default) populates them, not only validators.

The attested-invalid-proposal watcher is now wired to the proposal handler (validator-owned or standalone) instead of the validator client, so a non-validator offense collector (OFFENSE_COLLECTION_ENABLED) can detect attested-to-invalid-checkpoint-proposal offenses. ValidatorClient delegates its has/mark calls to the handler, preserving validator behavior.
Drop the now-redundant invalid-checkpoint slot mark in ValidatorClient: the all-nodes checkpoint handler already marks the slot before invoking the failure callback. The FifoSet add was idempotent so the double-mark was harmless, but the validator-side call is dead.

Guard non-validator invalid-block marking with the escape-hatch check, matching the validator path that intentionally disables invalid-block slashing while the escape hatch is open.

Mark proposal equivocation from the p2p duplicate-proposal callback in ProposalHandler.register. p2p detects duplicate proposals without routing them through the proposal handlers, so without this a non-validator offense collector could mark a slot invalid without the matching equivocation mark and false-positive slash attesters. Validators overwrite this callback with their own richer handler.
…empotent

The archiver preloads bundled protocol contract classes and instances at
synthetic block 0. When a bundled protocol class id (or instance address) was
later published on chain, the contract stores threw "already exists" while
re-adding the already-present key, stalling L1 sync.

Make the stores idempotent for protocol-preloaded entries:
- addContractClass / addContractInstance now skip (no-op) when the key already
  exists and it belongs to a bundled protocol contract, keeping the existing
  block-0 entry untouched. Genuine non-protocol redefinitions still throw.
- deleteContractClass / deleteContractInstance never delete protocol entries,
  so they survive reorgs of the publishing block.

Adds isProtocolContractClass to protocol-contracts as a sibling of
isProtocolContract, plus store unit tests and an integration test that
publishes a bundled protocol class id through ArchiverDataStoreUpdater.

Fixes A-1257
…nt state

The archive tree is an append-only accumulator of block-header hashes, so a single
bad leaf (e.g. from a mishandled reorg) is never self-corrected: every later root
stays noncanonical while the other state trees can re-converge from block effects.
sync_block only checked the four non-archive trees, so a self-consistent orphan
block — commonly an empty one — could silently fork the archive root from canonical.

Verify the archive root against canonical both before appending (the committed root
must equal the block's lastArchive) and after (the resulting root must equal the
block's archive), failing before commit so the divergence is never persisted. The
checks are optional std::optional parameters; the napi and wsdb transports forward
the canonical roots from the block being synced.
…nt state (#24229)

Fixes A-1235

# A-1235 reviewer notes: archive-root divergence and the v4/v5
block-stream race

## Short version

The A-1235 symptom was a mainnet fisherman rejecting every proposal with
`ReExInitialStateMismatchError`. The proposal/canonical
`lastArchive.root` was correct; the local
world-state archive root was wrong.

The important distinction for review is:

- v4.3.0 appears vulnerable to a specific block-stream/cache race that
can seed this state.
- v5 has already hardened that specific race.
- v5 still lacks the world-state archive-root invariant, so if any other
path ever writes a bad
  archive leaf, the corruption can still be committed silently.

This change is therefore defense in depth at the only layer that can
conclusively reject a bad
archive tree before it is persisted: `WorldState::sync_block`.

## What went wrong

World state maintains the archive tree as an append-only accumulator of
block-header hashes. During
`sync_block`, v4.3.0 appended the new block header hash and then
checked:

- `is_archive_tip`: the just-appended header hash is the tip leaf.
- `is_same_state_reference`: nullifier, note hash, public data, and
L1-to-L2 message trees match the
  block state reference.

Neither check verifies the archive root against canonical block data:

- before append: local archive root must equal the block's
`lastArchive.root`;
- after append: computed archive root must equal the block's
`archive.root`.

That means an orphan block can be internally self-consistent and still
poison the archive tree.
If the orphan and canonical block at the same height have the same
effects, commonly because both
are empty, the four non-archive trees can reconverge while the archive
tree remains permanently
forked.

The resulting sequence is:

1. World state syncs an orphan block at height `F`.
2. The archiver later canonicalizes a different block at height `F`.
3. The block stream should prune world state to `F - 1`.
4. In the suspected v4 race, it prunes only to `F`, so the orphan
archive leaf remains.
5. World state syncs canonical descendants starting at `F + 1` on top of
the orphan leaf.
6. The four-tree state-reference check passes, but every archive root
from `F` onward is off-chain.

## The v4 block-stream/cache race

The original explainer says roughly: `getL2Tips` had processed the
reorg, while the archiver had not
yet swapped orphan `F` for canonical `F`.

More precisely, this is not about committed DB state being partially
updated. The block/checkpoint
mutation is one LMDB writer transaction. The race is that v4's
`L2TipsCache` can publish a tip
computed from the uncommitted writer view, while separate block/header
reads still use committed
read transactions.

In v4.3.0:

- `ArchiverDataStoreUpdater.addCheckpoints()` wraps prune, checkpoint
insertion, logs, contract data,
  and `l2TipsCache.refresh()` in one `store.transactionAsync(...)`.
- `L2TipsCache.refresh()` assigns `#tipsPromise = this.loadFromStore()`
before the writer transaction
  commits.
- The LMDB wrapper reuses the active write transaction for reads inside
that callback, so
  `loadFromStore()` can see the post-reorg uncommitted view.
- A concurrent consumer calling `archiver.getL2Tips()` can receive that
cached future/post-reorg tip.
- The same consumer's later `getBlockHeader(F)` call is outside the
writer context, so it opens a
normal committed read transaction and can still see the pre-reorg orphan
at `F`.

The v4 `L2BlockStream` performs exactly this mixed read pattern:

```ts
const sourceTips = await this.l2BlockSource.getL2Tips();
const localTips = await this.localData.getL2Tips();

let latestBlockNumber = localTips.proposed.number;
const sourceCache = new BlockHashCache([sourceTips.proposed]);
while (!(await this.areBlockHashesEqualAt(latestBlockNumber, { sourceCache }))) {
  latestBlockNumber--;
}
```

`areBlockHashesEqualAt()` then asks the source for a per-height hash via
`getBlockHeader(blockNumber)`.
So a single pass can be planned from a future/post-reorg tip but compare
old committed per-height
headers. If both local world state and the committed archiver still have
the orphan at `F`, the walk
can falsely conclude `F` is the common ancestor. Pruning to `F` keeps
block `F`; the correct target
was `F - 1`.

That is the suspected seed path for A-1235. The live evidence proves the
archive tree forked; the
historical interleaving predates retained logs, so treat this as the
best source-grounded mechanism,
not as directly logged fact.

## What v5 changes in this area

v5 has multiple changes that make the specific v4 race much less
plausible.

### 1. Tip cache refresh is post-commit

In v5, `L2TipsCache` says refresh should happen after the writer
transaction has committed, and
`ArchiverDataStoreUpdater` does exactly that:

```ts
const result = await this.stores.db.transactionAsync(async () => {
  // mutate blocks/checkpoints/logs/etc.
  return ...;
});
await this.l2TipsCache?.refresh();
return result;
```

So the cache is loaded from committed DB state, and an aborted writer
cannot replace the cache with
a future view.

### 2. `getL2TipsData()` is one DB snapshot

v5 moves chain-tip construction into
`BlockStore.getL2TipsData(genesisBlockHash)` and wraps it in a
single `db.transactionAsync(...)`. It also validates the resulting tier
ordering:

- finalized <= proven <= checkpointed <= proposed;
- checkpointed block <= proposed block.

That is materially stronger than v4's `L2TipsCache.loadFromStore()`,
which assembled tip numbers and
block data through several independent store calls.

### 3. The block stream fails closed on incoherent source reads

v5's `L2BlockStream` no longer uses the old checkpoint-prefetch path in
the same way. It drives from
a source tips snapshot, compares per-height hashes via `getBlockData({
number })`, and adds guards:

- missing local hash compares unequal rather than accidentally stopping
the walk;
- missing source data at or below the advertised source proposed tip
aborts the pass;
- source tips are re-read after a prune before downstream
reconciliation;
- the download pass verifies the delivered proposed block hash matches
the snapshot's proposed hash;
- tier advancement is skipped if the block download plan did not
complete.

These are all aimed at preventing a stale or mixed source snapshot from
becoming an under-deep prune.
Over-deep or skipped reconciliation is recoverable; under-deep pruning
is dangerous because it can
leave the losing fork's block at the divergence height.

## Why this change still matters on v5

The v5 block stream fixes the known/suspected seed path, but it does not
add the missing invariant to
world state.

Without this patch, `sync_block` can still commit a divergent archive
tree if any future path presents
it with a self-consistent block whose non-archive state matches:

- a different reorg bug;
- a cache/snapshot bug elsewhere;
- operator/datadir corruption;
- a future refactor that bypasses one of the v5 block-stream
protections.

Once a bad archive leaf is committed, appending more canonical leaves
does not repair it. The archive
root remains noncanonical forever while the four non-archive trees can
look healthy.

The fix makes `sync_block` verify the archive tree at the source of
truth:

```cpp
// Before append: committed local root must be the block's parent archive root.
actual_previous_archive_root == expected_previous_archive_root

// After append, before commit: uncommitted computed root must be the block's archive root.
actual_archive_root == expected_archive_root
```

These checks turn silent permanent corruption into a loud sync failure
before commit. The node may
need a datadir resync, but it will not write and seal in the divergent
archive leaf.

## Reviewer checklist

- Confirm the TS/native message path passes both canonical roots:
  - `expectedPreviousArchiveRoot = l2Block.header.lastArchive.root`
  - `expectedArchiveRoot = l2Block.archive.root`
- Confirm native `sync_block` checks the previous root before
`add_value`.
- Confirm native `sync_block` checks the resulting root before `commit`.
- Confirm errors clearly tell operators that local world state diverged
and must be resynced.
- Confirm tests cover the archive-only divergence case, not just
non-archive state-reference mismatch.

## Expected operator behavior

This patch prevents recurrence; it does not repair an already-poisoned
archive tree. A node that
already has the bad historical leaf must wipe/resync its world-state
datadir.
…24202)

## Motivation

The sentinel is required for the network to reach consensus on slashing
offenses — it's a core duty of participating in consensus, not an
optional add-on. Yet `SENTINEL_ENABLED` defaulted to `false`, so a
validator could run without one and silently degrade the network's
ability to enforce its own rules.

While fixing that, two adjacent issues surfaced:

- Whether the node runs the sentinel and the slashing-detection watchers
was gated on a *prover-only* flag (`enableProverNode &&
disableValidator`). That coupling is wrong: those subsystems should
depend on **validator status**, not on whether the prover is enabled.
- A non-validator (e.g. an RPC or full node) had no way to collect and
inspect offenses, because the only component that persists them lives
inside the slasher client, which was created only for validators.

## Approach

- **Force the sentinel for validators.** `createSentinel` now
self-gates: if the node runs a validator it always creates the sentinel
(ignoring `SENTINEL_ENABLED`); otherwise it respects `SENTINEL_ENABLED`
(default `false`). It logs which case applies so operators understand
why a `SENTINEL_ENABLED=false` is being overridden.
- **Decouple subsystem gating from the prover.** Removed the
`proverOnly` guard. The slashing-detection watchers and the slasher are
now gated on `collectOffenses = !disableValidator ||
enableOffenseCollection`.
- **Opt-in offense collection for non-validators.** A new
`OFFENSE_COLLECTION_ENABLED` flag lets a non-validator run the watchers
plus a read-only slasher client (it never writes to L1 —
voting/execution flow through the sequencer publisher, which a
non-validator lacks) to collect and serve offenses over the existing
`getSlashOffenses` admin RPC. The spartan deploy enables it by default,
mirroring `SENTINEL_ENABLED`.
- **Make the attested-invalid-proposal watcher work without a
validator.** Lifted the invalid-proposal / equivocation slot tracking
out of `ValidatorClient` into `ProposalHandler`, which now implements
`InvalidProposalSlotSource`. The handler populates this from its
all-nodes proposal handlers, so any node that re-executes proposals (the
default) can feed the watcher.

## API changes

- New node config `enableOffenseCollection` (env
`OFFENSE_COLLECTION_ENABLED`, default `false`). Named distinctly from
the existing L1-deploy `AZTEC_SLASHER_ENABLED` to avoid confusion.
- Validator nodes now always run the sentinel regardless of
`SENTINEL_ENABLED`.

## Changes

- **aztec-node**: `createSentinel` forces the sentinel for validators
and logs the reason; `server.ts` removes `proverOnly`, gates watchers +
a split-out read-only slasher on `collectOffenses`, and wires the
attested-invalid-proposal watcher to the proposal handler.
- **aztec-node (config) / foundation**: new `enableOffenseCollection` /
`OFFENSE_COLLECTION_ENABLED` flag and env-var entry.
- **validator-client**: `ProposalHandler` now owns the invalid-proposal
/ equivocation slot tracking and implements `InvalidProposalSlotSource`;
`ValidatorClient` delegates to it, preserving validator behavior.
- **spartan**: `OFFENSE_COLLECTION_ENABLED` wired through the chart
value, pod template, network defaults, terraform variable, and deploy
script, defaulting on (mirrors `SENTINEL_ENABLED`).

Fixes A-1242
…empotent (#24227)

## Motivation

The archiver preloads every bundled protocol contract class (and its
canonical instance) into its local store at synthetic block 0, before L1
sync. World-state genesis, however, seeds no registration nullifiers for
those classes/instances. As a result a *first* on-chain
`ContractClassRegistry.publish` of a bundled protocol class id is
protocol-valid (fresh class-id nullifier + `ContractClassPublished`
log). On replay the archiver recomputes the same class id and
unconditionally re-inserts it, but the store throws on the pre-existing
block-0 key (`Contract class <id> already exists, cannot add again`).
Because that insert runs inside the block/checkpoint store transaction,
the throw aborts persistence, and L1 sync retries the same valid
checkpoint indefinitely — a sync stall.

## Approach

Make the archiver treat protocol-preloaded entries as idempotent and
immutable, guarded at the store layer (the single chokepoint for both
the add-throw and the block-gated delete):

- `addContractClass` / `addContractInstance`: when the key already
exists and it is a protocol class id / magic protocol address, treat the
(re-)publish as a no-op and keep the existing block-0 entry — crucially
**without** bumping its recorded block number. Genuine non-protocol
duplicates still throw unchanged.
- `deleteContractClass` / `deleteContractInstance`: skip deletion for
protocol entries, so a reorg of the publishing block can never roll out
the preload.

Keeping the preload at block 0 is deliberate: `deleteContractClass` only
deletes when the stored `l2BlockNumber >= blockNumber`, so block 0 makes
protocol entries survive any reorg; bumping to the publish block would
let a deep reorg delete them.

The instance-side guards are defensive only: on-chain
`publish_for_public_execution` always emits the *derived* address, never
a magic address, so the instance store is not reached with a magic
address via the on-chain replay path today — the guards exist for
symmetry and to protect future code paths.

This is a non-breaking, node-local resilience fix. A follow-up PR (PR2)
seeds the protocol registration nullifiers into world-state genesis so
the on-chain re-publish is rejected at the protocol level — the
root-cause fix for the class path.

## Changes

- **protocol-contracts**: add `isProtocolContractClass(classId)`
(sibling of the existing `isProtocolContract(address)`), backed by a set
of the generated `ProtocolContractClassId` values.
- **archiver**: idempotent add + protected delete for protocol classes
and instances in `ContractClassStore` / `ContractInstanceStore`.
- **archiver (tests)**: store unit tests (idempotent re-add stays
queryable with block-0 / bytecode-commitment preserved, protected
delete, non-protocol duplicate still throws — for both classes and
instances) and an A-1257 integration test that preloads protocol
contracts, builds an `L2Block` carrying a `ContractClassPublished` log
for a bundled class id, and asserts `addProposedBlock` commits and the
class stays queryable.

Fixes A-1257
…24065)

Fixes a race in `e2e_token_bridge_tutorial_test` where L1 setup
transactions were submitted but not awaited before dependent bridge
calls.

- Waits for the `TestERC20.addMinter` transaction receipt before minting
through the handler.
- Waits for the `TokenPortal.initialize` transaction receipt before
simulating `depositToAztecPublic`.
- Prevents intermittent `SafeERC20FailedOperation(address token) (0x0)`
when the portal deposit simulation runs before `underlying` is
initialized.

See http://ci.aztec-labs.com/e100f59864b9c18c for sample failed run.
## Problem

The nightly `block-capacity-benchmark` job has been failing fast (~5
min) with every transaction rejected:

```
Invalid tx: Insufficient fee payer balance (required=507345496560000, available=0)
```

All 8 benchmark cases fail on the first `send`. The build succeeds and
the network is reachable — this is purely a genesis-funding problem.

## Root cause

`block_capacity.test.ts` pays **all** fees through
`SponsoredFeePaymentMethod` (account deploys, mints, transfers). The fee
payer is therefore the sponsored FPC, not the worker wallets. But
`spartan/environments/block-capacity.env` never set
`SPONSORED_FPC=true`, so the network was deployed **without funding the
sponsored FPC at genesis** — its fee-juice balance is `available=0`, so
no tx can be paid for.

Every other bench env already sets this flag (`tps-scenario`,
`bench-10tps`, `prove-n-tps-fake`, `prove-n-tps-real`, `next-scenario`);
`block-capacity.env` was the only one missing it. `network-defaults.yml`
documents the flag as *"Fund sponsored FPC with fee juice"*.

## Fix

Add `SPONSORED_FPC=true` to `block-capacity.env`.

Found while investigating the failures in nightly Spartan Benchmarks run
#95.

@ludamad ludamad left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto-approved

@AztecBot

Copy link
Copy Markdown
Collaborator Author

🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass.

@AztecBot AztecBot added this pull request to the merge queue Jun 23, 2026
Merged via the queue into v5-next with commit 7a013ad Jun 24, 2026
52 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants