Python scripts replacing the original notebooks for the curation pipeline.
uv python install 3.12
uv python pin 3.12
uv venv --python 3.12
source .venv/bin/activate
uv pip install -e .Notes:
- Python 3.12 is recommended. Python 3.13+ (including 3.14) may try to build
fionafrom source and require a local GDAL install. - GDAL is required for
osgeobindings and for CLI tools likeogr2ogr.
Install GDAL (includes ogr2ogr) before installing Python deps:
# macOS
brew install gdal
# Ubuntu/Debian
sudo apt-get install -y gdal-bin libgdal-devPython GDAL bindings must match the system GDAL version. If pip/uv fails to build gdal, install a compatible GDAL version first.
scripts/extract_metadata.pyscripts/get_thumbs.pyscripts/create_pm_tiles.pyscripts/export_gdb_feature_classes_to_gpkg.py
Use scripts/embed_qgis_metadata.py to write QGIS-style XML metadata into every
GeoPackage in a directory. The script reads one CSV row per GeoPackage, renders
scripts/templates/qgis-metadata.xml with values from that row, fills the CRS
and bounding box from the GeoPackage itself, and stores the XML in the
GeoPackage metadata extension tables.
For the Milwaukee urban base layers:
uv run python scripts/embed_qgis_metadata.py \
mke-ubl \
mke-ubl/b1g_55-53000_primary.csvThe third positional argument is optional and can point to a different XML template:
uv run python scripts/embed_qgis_metadata.py \
path/to/geopackages \
path/to/metadata.csv \
path/to/qgis-metadata.xmlThe default match column is filename. Each value in that column must exactly
match a GeoPackage filename in the target directory, including the .gpkg
extension:
filename,Title,Description,ID,Date Range,Theme,Provenance,Rights,Source
mke_boundary_2026.gpkg,Municipal boundary [Wisconsin--Milwaukee] {2026},...,b1g_5XPUIjJ9q7Z8,2026-2026,Boundaries,...,...,...The script accepts OpenGeoMetadata-style CSVs like
mke-ubl/b1g_55-53000_primary.csv. Extra columns are ignored unless the XML
template references them.
Use --match-column if the filename is stored in a different column:
uv run python scripts/embed_qgis_metadata.py \
path/to/geopackages \
path/to/metadata.csv \
--match-column "Identifier"The match column must be unique. Blank match values are ignored, and duplicate values stop the run with an error.
The default template is scripts/templates/qgis-metadata.xml. Any text or
attribute value in the template can include tokens in braces. Tokens are
case-insensitive CSV column names:
<identifier>{ID}</identifier>
<title>{Title}</title>
<abstract>{Description}</abstract>
<rights>{Rights}</rights>The default template currently uses these CSV columns:
IDSourceTitleDescriptionThemeProvenanceRightsDate Range
Two special token forms are available for range-like fields:
<start>{Date Range first value}</start>
<end>{Date Range last value}</end>For a value like 2024-2026, the first token resolves to 2024 and the last
token resolves to 2026. The resolver first looks for four-digit years. If no
years are present, it splits on |, ;, ,, or /.
The special {now} token resolves to the current date in YYYY-MM-DD format.
For each matched GeoPackage, the script:
- reads the first feature table in
gpkg_contentsto get the extent and SRS; - replaces the template
<crs><spatialrefsys>block with values fromgpkg_spatial_ref_sys; - replaces the template
<extent><spatial>attributes with the GeoPackage bounding box; - drops and recreates existing
gpkg_metadataandgpkg_metadata_referencetables; - inserts one dataset metadata record and references it from every feature table in the GeoPackage;
- refreshes
gpkg_extensionsrows for the GeoPackage metadata extension whengpkg_extensionsexists.
Unmatched GeoPackages are skipped and left unchanged. Metadata rows that do not match any GeoPackage are reported at the end of the run.
Use build_pmtiles_from_gpkg.py to recursively convert GeoPackage vector layers
to EPSG:4326 FlatGeoBuf files, then to PMTiles with Tippecanoe. The script
supports multi-layer GeoPackages, configurable field dropping, resumable runs,
and CSV or JSON reports.
Install the required command-line tools first:
brew install gdal tippecanoeStart with a field inventory report so you can review large attribute tables:
python build_pmtiles_from_gpkg.py \
--input-dir ./gpkg \
--fgb-dir ./fgb \
--pmtiles-dir ./pmtiles \
--config pmtiles_config.json \
--report pmtiles_build_report.csv \
--field-report-onlyCopy pmtiles_config.sample.json to pmtiles_config.json, then edit layer
rules and field keep/drop settings.
Run the conversion:
python build_pmtiles_from_gpkg.py \
--input-dir ./gpkg \
--fgb-dir ./fgb \
--pmtiles-dir ./pmtiles \
--config pmtiles_config.json \
--report pmtiles_build_report.csvRerun without rebuilding completed outputs:
python build_pmtiles_from_gpkg.py \
--input-dir ./gpkg \
--fgb-dir ./fgb \
--pmtiles-dir ./pmtiles \
--config pmtiles_config.json \
--report pmtiles_build_report.csv \
--skip-existing