Watchdog v1 (compare mode) + release bundle versioning#14
Merged
Conversation
e424146 to
f1796d2
Compare
bb4bbc2 to
e609033
Compare
45c89a5 to
0847b8c
Compare
06fa4de to
73f7c19
Compare
a799571 to
2b7bf5b
Compare
Wire sequencer safe-only state export, canonical CM inspect, and Lua compare-mode E2E against Anvil devnet. Adds staging drill docs and harness fixes (sequencer-devnet resolution, optional faketime).
Replace GET /get_state with finalized snapshot routes for operator compare. Unify wallet SSZ encoding across sequencer dumps, CM inspect, and watchdog; share L1 partition test vector with Rust. Restructure Lua modules (lcurl binding, machine_cartesi, sequencer_reader), add devnet-stack helper and production-like operator runbooks.
… e2e Replace webhook alarms with watchdog_event JSON and exit codes 0/1/2. Run genesis and non-genesis compare in rollups-e2e; build lcurl.so in CI before the harness.
Track pinned Lua-cURLv3 under watchdog/third_party/lua-curl so CI and watchdog-lua-deps can build lcurl.so without a network fetch.
Centralize toolchain pins in release/versions.env; load them in CI and release workflows. Add release manifest generation, version verification, watchdog Docker image build, and docker-save tarballs per arch.
The staging drill intentionally exits like production on mismatch; wrap the just target so local smoke runs succeed while direct invocation still returns 2 for operators.
…rness) Harden the watchdog release and test path: Dockerfile runs under bash with full cartesi-machine runtime deps, CI adds a docker smoke job and divergence drill, e2e exercises production main.lua via machine_cartesi, runner retries when finalized inclusion_block moves during compare, non-genesis tests assert real snapshot content, and the divergence drill drives main.run_compare_cycle with an exit-2-aware wrapper script.
Clear stale checkpoint snapshots on retry, reject multi-chunk CM inspect, and retry when L1 RPC latest head lags the sequencer target block. Pin lua-curl tarball sha256, remove unused watchdog config and dead storage SQL, fix operator docs and release prerelease handling, and document checkpoint retention.
Add a real non-genesis divergence e2e (devnet sequencer vs sepolia CM image), drop machine_cli in favor of machine_cartesi-only compare paths, and give watchdog its own justfile with a doctor recipe for local toolchain checks.
machine_cartesi must instantiate via cartesi.new():load(), not the cartesi.machine userdata factory. Doctor now probes a real CM load, and rollups-e2e ensures the sepolia machine image exists for the divergence scenario.
Document doctor, divergence e2e, and main.lua success semantics; harden checkpoint rm, docker entrypoint, and docker-smoke arch detection.
…duction
Reshapes the watchdog to a single minimal compare path and closes the issues
found across review.
Simplify:
- One job, one shot. Removed the compare/advance mode split and the daemon loop.
`init` records the canonical bootstrap state once; `tick` runs exactly one
compare cycle and exits 0 (clean/idle) / 1 (transient) / 2 (divergence). Infra
schedules re-runs and enforces non-overlap via flock; no in-process loop/lock.
- Removed the verified-bit / advance-checkpoint provenance machinery: a persisted
checkpoint is verified by construction (only a successful compare writes one),
so the cheap-skip needs no extra state.
Harden / correctness:
- Crash-safe keep-1 checkpointing: atomic head.json pointer flip + predecessor
GC; only ever writes a fresh checkpoint dir (no destructive in-place rewrite).
- Bootstrap is verified at its own block before being trusted; exit 0 means
verified-or-idle on every path (fails closed to 1 or 2, never a false OK).
- Stream bisected eth_getLogs into the CM (no whole-range materialization),
order-equivalent to the global sort.
- L1 RPC URL supplied per tick, never persisted (no provider secret at rest).
Build / CI:
- Vendored lua-cURLv3 in-tree for a hermetic build (no build-time download or pin
to verify); Docker image builds and smoke-runs require('cartesi') in CI;
graceful scheduler Inspect (no guest panic on an unknown query).
- e2e drives the production binding incl. a real-component divergence test;
e2e prerequisites hard-fail instead of skipping vacuously.
Docs updated for init/tick, head.json / config.json / WATCHDOG_STATE_DIR, flock,
and the checkpoint-backup story.
4679b45 to
d1673f1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR delivers two related scopes as one release:
1. Watchdog v1 — compare mode
Off-chain watchdog that compares sequencer finalized SSZ (
GET /finalized_state) against canonical Cartesi Machineinspectat the same L1 inclusion block.machine_cartesi(prod) /machine_cli(harness); advance checkpointing retainedwallet_snapshotencoding between sequencer, scheduler, and CMtests/fixtures/l1_partition_vector.json) for Lua + Rustwatchdog_eventJSON on stderr; exit codes0ok /1transient /2deterministic mismatch; webhook/alarm.luaremovedstate_mismatch/inclusion_block_regressedwatchdog_genesis_compare_test+ non-genesis compare at end ofdeposit_transfer_withdrawal_test; CI buildslcurl.soviajust watchdog-lua-deps2. Release packaging — aligned artifact versions
Release tag
vXis the bundle version for sequencer binaries, CM image tarballs, and watchdog.release/versions.env(Rust, xgenext2fs, cartesi-machine, lua-curl).github/actions/load-release-versions;scripts/verify-release-versions.shin CIscripts/generate-release-manifest.sh→release-manifest-vX.jsonon GitHub Releasewatchdog/Dockerfile+docker savetarballs per arch;/opt/watchdog/RELEASE.json+ OCI labelswatchdog/third_party/lua-curl/(pinned viaUPSTREAM/versions.env)CARTESI_MACHINE_VERSIONinversions.envmust match the emulator inside the watchdog image and the one used to buildcanonical-machine-image-*tarballs. Seerelease/README.md.Historical note
Early commits introduced
GET /get_state; later commits pivoted to/finalized_state+ SSZ snapshot API (current design).Test plan
lua watchdog/tests/run.luajust test-watchdog-divergence-drill(exit2+watchdog_event)cargo run -p rollups-e2e --bin rollups-e2e -- watchdog_genesis_compare_test --exact --nocapturejust test-rollups-e2e(includesdeposit_transfer_withdrawal_testnon-genesis compare)bash scripts/verify-release-versions.shFollow-ups (out of scope)
watchdog_eventalerting contractmachine_clionce in-processcartesibinding matches CLI archive format