Summary
A sampled PE-US rebuild smoke run failed before the first durable checkpoint while loading the SCF donor source. The run had a local --policyengine-us-data-repo checkout, but the SCF loader attempted to create a lock file next to the source H5 in that checkout:
/Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/scf_2022.h5.lock
In this environment the local data checkout was readable but not writable, so the donor source failed after the run had already loaded CPS/PUF inputs.
Command shape
.venv/bin/python -m microplex_us.pipelines.pe_us_data_rebuild_checkpoint \
--output-root /private/tmp/microplex-us-full-smoke \
--version-id local-smoke-v1-entropy-after-179-run2 \
--baseline-dataset /Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5 \
--targets-db /Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/calibration/policy_data.db \
--policyengine-us-data-repo /Users/administrator/Documents/PolicyEngine/policyengine-us-data \
--policyengine-us-data-python /Users/administrator/Documents/PolicyEngine/worktrees/microplex-us/fix-pe-rebuild-smoke-issues/.venv/bin/python \
--calibration-backend entropy \
--donor-imputer-backend zi_qrf \
--policyengine-materialize-batch-size 100000 \
--cps-sample-n 1000 \
--puf-sample-n 1000 \
--donor-sample-n 1000 \
--n-synthetic 1000 \
--no-include-acs \
--defer-policyengine-harness \
--defer-policyengine-native-score \
--defer-native-audit \
--defer-imputation-ablation \
--pipeline-checkpoint-save-post-imputation-path /private/tmp/microplex-us-checkpoints/local-smoke-v1-entropy-after-179-run2/post_imputation \
--pipeline-checkpoint-save-post-microsim-path /private/tmp/microplex-us-checkpoints/local-smoke-v1-entropy-after-179-run2/post_microsim
Failure
PermissionError: [Errno 1] Operation not permitted: '/Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/scf_2022.h5.lock'
Stack path:
microplex_us.data_sources.donor_surveys._default_scf_tables_loader
microplex_us.data_sources.donor_surveys._run_policyengine_dataset_loader_from_spec
policyengine_us_data.datasets.scf.scf.SCF_2022.load_dataset
filelock._unix._acquire
Expected behavior
Before starting the expensive donor/materialization run, the rebuild CLI should preflight donor-source local input requirements, including any write/lock-file requirements of the referenced policyengine-us-data checkout. If the source checkout must be writable, fail immediately with a targeted message. If the data can be loaded read-only, route lock/cache files to a writable cache/temp location.
This matters because the run can otherwise fail before post_imputation, leaving no durable checkpoint to resume from.
Summary
A sampled PE-US rebuild smoke run failed before the first durable checkpoint while loading the SCF donor source. The run had a local
--policyengine-us-data-repocheckout, but the SCF loader attempted to create a lock file next to the source H5 in that checkout:In this environment the local data checkout was readable but not writable, so the donor source failed after the run had already loaded CPS/PUF inputs.
Command shape
Failure
Stack path:
Expected behavior
Before starting the expensive donor/materialization run, the rebuild CLI should preflight donor-source local input requirements, including any write/lock-file requirements of the referenced
policyengine-us-datacheckout. If the source checkout must be writable, fail immediately with a targeted message. If the data can be loaded read-only, route lock/cache files to a writable cache/temp location.This matters because the run can otherwise fail before
post_imputation, leaving no durable checkpoint to resume from.