perf(lava): WAL journaling for the LAVA database#285
Conversation
|
@OneSixForensics can you resync your fork, there were some base code updates to catch up to the other LEAPP projects. |
The media insert helpers (lava_insert_sqlite_media_item / lava_insert_sqlite_media_references) commit() per row, so a media-heavy artifact (e.g. a device backup of tens of thousands of files) triggers tens of thousands of fsync'd commits. The db was opened with default rollback journal + synchronous=FULL, so each commit fsyncs the whole database and the run can take many minutes (looks like a hang). Set journal_mode=WAL + synchronous=NORMAL right after connect. Same durability for a tool run; ~125x faster on a 34k-commit workload in testing (52s -> 0.4s). Benefits every module, not just one parser. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
bbc9fcd to
5e8cf38
Compare
|
Thanks @stark4n6 — done. Resynced the fork onto current Heads up from the resync: the base-code update moved file resolution into |
|
my biggest concern is around LEAPP output going directly to network drives.
WAL isn't guaranteed to fail, as much as that statement suggests, but there is definitely risk of it. this can cause corrupt files or even crash LEAPP. FULL seems to be the most compatible mode when writing to network storage. we have had complaint in the past about sqlite issues on network storage. at the very least, I think we should test the journal mode after setting before switching sync mode. this doesn't truly detect network paths, but provides a little bit of safety. something along these lines: mode = lava_db.execute("PRAGMA journal_mode=WAL").fetchone()[0]
if mode.lower() == "wal":
lava_db.execute("PRAGMA synchronous=NORMAL")it would be better to implement a network path detection routine (can we?) that can be used as part of the decision before enabling this mode. open to thoughts. |
…aths WAL is not safe over a network filesystem (https://www.sqlite.org/wal.html), and examiners commonly write LEAPP output straight to NAS/mapped drives, where unconditional WAL risks DB corruption or a crashed run. Add scripts/storage_safety.py with best-effort network-path detection (Windows GetDriveType + UNC; Linux /proc/mounts fstype) and have initialize_lava enable WAL + synchronous=NORMAL only when output_path is confirmed local. On network or undetermined paths it stays on the network-safe rollback journal. After requesting WAL it verifies the mode actually took effect before tuning synchronous, per review feedback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
LEAPP prepends the Windows extended-length prefix \?\ to every drive-letter output path (rleapp.py), so initialize_lava receives e.g. \?\X:\Reports\... _is_unc_path keyed on a leading \, so it misclassified that local path as a UNC network share and disabled WAL on a local drive (the slow path). Strip the extended-length prefix before the UNC test: \?\UNC\srv\share -> \srv\share (still network) \?\X:\dir -> X:\dir (local; GetDriveType decides) Found while validating WAL gating against a real ICAC return written to X:. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5e8cf38 to
fc9c645
Compare
|
Thanks @JamesHabben — good catch, and you're right to flag it. I went with the network-path detection route you floated rather than just the sync-mode guard. Added Validated against a real ~32 GB provider return, same input, two output targets:
One thing that fell out of testing worth a separate look: LEAPP prepends the Windows extended-length prefix Happy to tweak the network fstype list or the detection heuristics if you'd rather be more/less conservative. |
perf(lava): WAL journaling for the LAVA database
_lava_artifacts.dbis opened with no journal/synchronous pragmas, i.e. the default rollback journal +synchronous=FULL. The media insert helpers (lava_insert_sqlite_media_item/lava_insert_sqlite_media_references)commit()per row, so a media-heavy artifact (e.g. a device backup with tens of thousands of files) triggers tens of thousands of fsync'd commits — each fsyncing the whole database. On such artifacts the run can take many minutes and looks like a hang.This sets
journal_mode=WAL+synchronous=NORMALright aftersqlite3.connect. Same effective durability for a single tool run (only a power loss during the final checkpoint window could lose the tail), and dramatically faster.Benchmark
~34,000 per-row commits (≈ a 17k-file media artifact: one media item + one reference each):
synchronous=FULL)synchronous=NORMAL~125× on the commit workload. Benefits every module that registers media, not just one parser.
Found while bringing a 17k-file device-backup artifact into LAVA (companion Synchronoss parser PR).