We need a way to back up Validator state (validated transactions and blocks). There are various options to consider.
I would weigh these considerations the most when assessing the options:
- Backup should be async to the hot path
- Recovery point objective (RPO) should be low (window of potential data loss before backup complete)
- Code complexity / maintainability
Option 1: gRPC Stream API + Backup Process
The Validator gRPC API provides an endpoint for streaming historic and live data.
A new process is implemented which connects to the stream and dumps the data to DB and disk.
+ Backup is integrated into format (SQL/file) which Validator requires
+ Backup process is robust to failures (can retry/restart without impact)
+ Backup process is async to Validator's tx and block validation
- Most code complexity (gRPC endpoint, backup client)
Option 2: Backup to S3
The Validator performs backup of transactions and blocks to S3 directly.
+ Minimal code complexity
- Backup is not integrated into format (SQL/file) which Validator requires
- Backup is synchronous to Validator's tx and block validation
- Backup failure impacts Validator
Option 3: Spare Validator
The primary Validator forwards all requests it receives to a spare Validator.
+ Minimal code complexity
+ Backup is integrated into format (SQL/file) which Validator requires
- Backup is synchronous to Validator's tx and block validation
- Backup failure impacts Validator
Option 4: AWS EBS Snapshots
Periodic snapshots of EBS the Validator's volume.
+ Minimal code complexity
+ Backup is integrated into format (SQL/file) which Validator requires
- Hourly RPO
Option 5: Litestream (or similar)
Run an out-of-band process that streams sqlite data to S3.
+ Minimal code complexity / none
+ Can be used for Validator, Sequencer, RPCs etc
+ Backup is integrated into format (SQL/file) which Validator requires
+ Backup process is robust to failures (can retry/restart without impact)
+ Backup process is async to Validator's tx and block validation
- Only backs up SQLite data
- Adds an external operational dependency (a separate process to run/monitor)
| Option |
1. Async (off hot path) |
2. Low RPO |
3. Low code complexity |
Format-integrated |
Robust to failure |
Reusable across components |
| 1. gRPC stream + backup process |
✅ |
✅ (~realtime) |
❌ (most code) |
✅ |
✅ |
❌ (validator-specific) |
| 2. Backup to S3 (direct) |
❌ (synchronous) |
✅ (sync → near-zero) |
✅ |
❌ (raw dump) |
❌ (failure blocks validator) |
🟡 (custom per component) |
| 3. Spare validator |
❌ (synchronous) |
✅ (sync → near-zero) |
🟡 (forwarding + spare lifecycle) |
✅ |
❌ (failure blocks validator) |
❌ (validator-specific) |
| 4. EBS snapshots |
✅ |
❌ (hourly) |
✅ |
✅ (whole volume) |
✅ |
✅ (volume-level) |
| 5. Litestream (or similar) |
✅ |
✅ (~1s) |
✅ |
🟡 (SQLite only, not file store) |
✅ |
✅ (validator, sequencer, RPC) |
We need a way to back up Validator state (validated transactions and blocks). There are various options to consider.
I would weigh these considerations the most when assessing the options:
Option 1: gRPC Stream API + Backup Process
The Validator gRPC API provides an endpoint for streaming historic and live data.
A new process is implemented which connects to the stream and dumps the data to DB and disk.
+Backup is integrated into format (SQL/file) which Validator requires+Backup process is robust to failures (can retry/restart without impact)+Backup process is async to Validator's tx and block validation-Most code complexity (gRPC endpoint, backup client)Option 2: Backup to S3
The Validator performs backup of transactions and blocks to S3 directly.
+Minimal code complexity-Backup is not integrated into format (SQL/file) which Validator requires-Backup is synchronous to Validator's tx and block validation-Backup failure impacts ValidatorOption 3: Spare Validator
The primary Validator forwards all requests it receives to a spare Validator.
+Minimal code complexity+Backup is integrated into format (SQL/file) which Validator requires-Backup is synchronous to Validator's tx and block validation-Backup failure impacts ValidatorOption 4: AWS EBS Snapshots
Periodic snapshots of EBS the Validator's volume.
+Minimal code complexity+Backup is integrated into format (SQL/file) which Validator requires-Hourly RPOOption 5: Litestream (or similar)
Run an out-of-band process that streams sqlite data to S3.
+Minimal code complexity / none+Can be used for Validator, Sequencer, RPCs etc+Backup is integrated into format (SQL/file) which Validator requires+Backup process is robust to failures (can retry/restart without impact)+Backup process is async to Validator's tx and block validation-Only backs up SQLite data-Adds an external operational dependency (a separate process to run/monitor)