Version 3 with cached cross chunk edges by akhileshh · Pull Request #454 · CAVEconnectome/PyChunkedGraph

akhileshh · 2023-08-06T20:03:24Z

Adds a new column family for cached cross chunks edges.
Adds MaxAgeGCRule for previous column family with supervoxel cross chunk edges; only needed during ingest and they get deleted eventually.
Edits make use of cached cross chunk edges.

Summary of changes in pychunkedgraph.ingest:

Layer 2 creation is mostly unchanged; stores cross chunk edges with supervoxels
- The column family used to store these edges now has a max age garbage collection rule
- During ingest, these edges can be used to cache higher layer cross chunk edges; will be deleted eventually by BigTable's garbage collection routines.
When ingesting layer 3, cross edges for children (layer 2) get updated and "lifted" by using the previously mentioned supervoxel cross chunk edges, these have a different column family so they're retained forever.
- At the same time, cross edges for parents at layer 3 will get created by merging cross edges of their children, these are intermediate and will be lifted when ingesting the next parent layer.
For each layer > 3 until root layer:
- Update children cross chunk edges by "lifting" the edges created during the previous layer ingest.
- Add parent cross chunk edges by merging children cross chunk edges; they will be updated when ingesting the next layer.

This assumes all chunks at lower layer have been created before creating the current layer so we can no longer queue parent chunk jobs automatically when its children chunks are complete.

We must now ingest/create one layer at a time.

Summary of changes in pychunkedgraph.graph.edits:

Edits are expected to be faster now; going to layer 2 to extract cross chunk edges is no longer necessary since they're cached at each layer.
During an edit, these cached cross chunk edges must be updated from both directions - to and from the newly created nodes and its existing neighbors.
- Most changes in this module are to handle this step.
- Caching these edges has also made the edits logic simpler and cleaner.
- When updating new cross edges, we need to ensure descendants get replaced by the highest parent.
- For splits, we need to filter out inactive cross edges after the local graph is read from bucket storage.

nkemnitz · 2023-09-06T08:32:16Z

+        return cross_edges_decorated(node_id)

    def parents_multiple(self, node_ids: np.ndarray, *, time_stamp: datetime = None):
+        node_ids = np.array(node_ids, dtype=NODE_ID)


Just saw this here (and some other places) - same as in #458: np.array will by default create a copy. np.asarray will avoid copies, if the requirements are already met.

sdorkenw

Overall this looks good besides the one point - a tricky one though - that I marked

sdorkenw · 2023-09-08T02:33:16Z

+            new_cx_edges_d[layer] = edges
+            assert np.all(edges[:, 0] == new_id)
+        cg.cache.cross_chunk_edges_cache[new_id] = new_cx_edges_d
+        entries = _update_neighbor_cross_edges(


I think this here can introduce problems if a neighboring node is a neighbor to multiple new_l2_ids.

_update_neighbor_cross_edges looks right to me. It writes a complete new set of L2 edges for a node. But if the same node is updated multiple times, then only the last update is reflected. Maybe the logic here takes care of this somehow but then it still introduces multiple unnecessary writes.

So, if I am correct about this, the solution would be to consolidate this call across all new_l2_ids to only make one call per neighboring node id.

sdorkenw · 2023-09-08T02:34:06Z

+            new_cx_edges_d[layer] = edges
+            assert np.all(edges[:, 0] == new_id)
+        cg.cache.cross_chunk_edges_cache[new_id] = new_cx_edges_d
+        entries = _update_neighbor_cross_edges(


same issue as above

Drop the vendored grid/harness/lock/exit_codes for the shared cave_pipeline.distribution package so the operator and every worker compute the same chunk-scatter bijection from one source; workers inject cg_factory and layer_bounds into the generic harness. Co-Authored-By: Claude <noreply@anthropic.com>

Picks up the 1-byte chunk-done marker the ingest workers write. Co-Authored-By: Claude <noreply@anthropic.com>

meta resolution/bounds derive from the watershed info JSON; sv lookup and the seg fallback read voxels via a neuroglancer_precomputed handle. cloud-volume stays a lazy ws_cv hatch (meshing/diagnostics). Co-Authored-By: Claude <noreply@anthropic.com>

nested imports (graph_tool via a _graph_tool shim) keep graph_tool, scipy, pandas, networkx, and cloudfiles off the cold import path; first use pays the load. bump kvdbclient to 0.7.1 (drops its cloud-volume). Co-Authored-By: Claude <noreply@anthropic.com>

…ds_by_label fastremap 1.20.0 emits 6-conn boundary voxels per label natively, so the `_label_boundary_mask` axial-diff pass and the `vol *= mask` mutation both go away. The point_cloud output (and therefore downstream KDTree min-distance queries) is unchanged. Co-Authored-By: Claude <noreply@anthropic.com>

ws_ts_scale(mip) reads the target scale (non-OCDBT mip>0 read mip 0). Mesh block size derives per-axis from the watershed pyramid, so chunk_size leaves mesh_config; setup rejects an out-of-range mip. Tests mock the tensorstore watershed reads. Co-Authored-By: Claude <noreply@anthropic.com>

The pipeline entrypoint carried a verbatim copy of setup_mesh_meta and MeshConfig that still required mesh_config.chunk_size, so mesh-meta failed once chunk_size was dropped from the dataset yaml. Import the single source of truth from pychunkedgraph.meshing instead. Co-Authored-By: Claude <noreply@anthropic.com>

PCG isn't on PyPI, but the image must report the pushed tag. The image pip-installs the package (--no-deps) with the tag fed to setuptools_scm via a cloudbuild build-arg; the hand-bumped literal + bumpversion are gone. Co-Authored-By: Claude <noreply@anthropic.com>

Non-meshing modules (app routes, sv-split profiler, pipeline/ingest entrypoints) imported meshing eagerly, pulling cloudvolume on import. Nest those imports so cv loads only when meshing actually runs. Co-Authored-By: Claude <noreply@anthropic.com>

Meshing splits initial from edited roots at a timestamp boundary, but sampling one root is unreliable: skip connections spread root creation times across layers. Instead, stamp earliest_ts when the root layer is written — the explicit cell timestamp shared by every root, lifted +500ms so the boundary sits strictly above them. get_earliest_timestamp returns it pre-edit; derive_initial_ts consumes it. Migrate no longer clobbers an ingest-stamped value. Co-Authored-By: Claude <noreply@anthropic.com>

setuptools_scm rejects non-PEP-440 strings, so pushing a build-label tag (not a semver) failed the image build. Pass the version build-arg only for version-like tags; other tags build with the Dockerfile default untouched. Co-Authored-By: Claude <noreply@anthropic.com>

A table copied/restored under a new graph_id must not let its meshes alias the source's. Hinge on dynamic_mesh_dir: an explicit graph-suffixed value shares initial meshes, so re-derive only the dynamic subdir; a bare "dynamic" or unset value gives the copy a private per-graph top-level dir. Move the whole rewrite into ChunkedGraphMeta.for_copied_graph so the graph class makes a single call. Co-Authored-By: Claude <noreply@anthropic.com>

Replace setuptools_scm (built 0.0.0 without a reachable semver tag) with a committed _version.py the release workflow bumps, commits, and tags in lockstep; setup.py reads it and the image build drops the version arg. Gate the workflow's Helm-chart update behind an opt-in input and add a workflows README. Versioning is per branch: main 2.x, pcgv3 3.x. Co-Authored-By: Claude <noreply@anthropic.com>

At chunk_layer + 1 == layer_count the parent column is rank-1 and the existing layer_agreement / np.where path returns chunk_layer instead of the root. Short-circuit to layer_count so meshes traversing up from the second-to-top layer reach the root. Co-Authored-By: Claude <noreply@anthropic.com>

Supervoxel splitting with base+fork support and locks

scipy 1.17.1's Python 3.14 wheel ships without the scipy._external subpackage, so `from scipy import ndimage` raises ModuleNotFoundError and every meshing/sv-split test errors during collection. Co-Authored-By: Claude <noreply@anthropic.com>

numpy is C-ABI-bound to conda's graph-tool-base, yet pip-compile also pinned it, so the image ran conda's latest against pip's downgraded lock — a broken mixed install. Pin it once in requirements.yml; [tool.pip-tools] unsafe-package keeps pip-compile from re-adding it to requirements.txt. Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

Add [tool.pytest.ini_options] with testpaths so a bare `pytest` collects the suite; test configuration previously lived only in the tox command. Co-Authored-By: Claude <noreply@anthropic.com>

The fixture duplicated gen_graph with file-backed edge/component I/O but had no callers. Co-Authored-By: Claude <noreply@anthropic.com>

Replace the duplicated `datetime.now(UTC) - timedelta(days=10)` with one helper so the "safely old" edit/lineage timestamp has a single source; rename local vars that collided with the helper name. Co-Authored-By: Claude <noreply@anthropic.com>

A test graph is determined by its atomic chunks; build_graph takes only that topology and builds the full (derivable) parent hierarchy at one timestamp, replacing hand-listed add_parent_chunk scaffolding. Co-Authored-By: Claude <noreply@anthropic.com>

Supervoxels are named and given as SV(x, y, z, seg) with zeros defaulting away, so build specs carry no repeated raw coordinate tuples; build_graph returns a BuiltGraph(cg, sv, ts) namedtuple and callers reference sv by name.

Drop the unused ts from the result; callers unpack `cg, sv = build_graph(...)` and reference supervoxels by name.

Replace the per-fixture gen_graph + create_chunk + add_parent_chunk + to_label scaffolding with named-supervoxel build_graph specs. The parent hierarchy is derived from the atomic chunks rather than hand-listed, and readable SV() coordinates replace the repeated raw (x, y, z, seg) tuples. Co-Authored-By: Claude <noreply@anthropic.com>

Reference every node through readable SV coordinates — build_graph for setup, the new label(cg, SV, layer) for construction/encoding tests — instead of positional to_label tuples. Adds an assert_graph_unchanged context manager for rejected-edit atomicity; drops the dead query fixture. Co-Authored-By: Claude <noreply@anthropic.com>

Move the edit-operation tests into tests/graph/edits/ with a coverage README, and lift the duplicated split/merge/undo setup graphs into shared edits/conftest.py fixtures. Co-Authored-By: Claude <noreply@anthropic.com>

Restore the original finite affinity (a split test must exercise the same edge type); the migration had switched it to an inf cross-chunk edge. Co-Authored-By: Claude <noreply@anthropic.com>

Shrink the lock-acquire backoff where the failed acquire is only setup, lower the test lock-expiry, and skip the doomed best-effort error-artifact write that dominated the error-path test. Assertions unchanged. Co-Authored-By: Claude <noreply@anthropic.com>

Every test parametrizes over kvdbclient_testing.backends(); the per-backend branches, the bigtable emulator bootstrap, and the hbase mock are gone, now shipped by kvdbclient. bootstrap() resolves the config class via get_config_class(). Requires kvdbclient>=0.8.0. Co-Authored-By: Claude <noreply@anthropic.com>

akhileshh requested a review from sdorkenw August 6, 2023 20:04

akhileshh force-pushed the pcgv3 branch from 5e1f12f to f3d3e5b Compare August 11, 2023 14:03

akhileshh changed the title ~~WIP~~ WIP V3 Aug 11, 2023

akhileshh marked this pull request as ready for review August 23, 2023 22:56

akhileshh changed the title ~~WIP V3~~ Version 3 with cached cross chunk edges Aug 23, 2023

akhileshh requested a review from fcollman August 24, 2023 00:43

akhileshh force-pushed the pcgv3 branch from 07dafa9 to bc571c8 Compare September 5, 2023 16:02

nkemnitz reviewed Sep 6, 2023

View reviewed changes

sdorkenw requested changes Sep 8, 2023

View reviewed changes

akhileshh requested a review from sdorkenw September 8, 2023 15:58

akhileshh force-pushed the pcgv3 branch 2 times, most recently from 1ddb0a7 to 17dfc10 Compare September 25, 2023 00:06

akhileshh force-pushed the pcgv3 branch from a8dc5f6 to bad9d4f Compare September 27, 2023 16:54

akhileshh force-pushed the pcgv3 branch from bbf80bb to 9381f87 Compare October 12, 2023 17:27

akhileshh force-pushed the pcgv3 branch from 9381f87 to 92b9078 Compare November 21, 2023 22:02

akhileshh force-pushed the pcgv3 branch from 92b9078 to d0a34e1 Compare December 2, 2023 17:34

akhileshh force-pushed the pcgv3 branch from d0a34e1 to fdb7aae Compare January 14, 2024 16:34

akhileshh force-pushed the pcgv3 branch 3 times, most recently from b13d8ec to d90813d Compare April 23, 2024 16:31

akhileshh force-pushed the pcgv3 branch 2 times, most recently from c460a5a to 6a2c5da Compare May 12, 2024 16:10

akhileshh force-pushed the pcgv3 branch from ff5b3ad to e638d8e Compare May 24, 2024 22:00

akhileshh force-pushed the pcgv3 branch 2 times, most recently from bf90549 to d2d9d44 Compare August 16, 2024 20:43

akhileshh force-pushed the pcgv3 branch 2 times, most recently from 280f9fe to cc4cd46 Compare September 3, 2024 01:29

akhileshh force-pushed the pcgv3 branch from cc4cd46 to 77947f1 Compare September 13, 2024 15:24

akhileshh force-pushed the pcgv3 branch 2 times, most recently from 6605bab to dcbecd1 Compare September 29, 2024 19:42

akhileshh and others added 30 commits June 15, 2026 02:34

build: bump cave-pipeline to 0.0.3

750f4d5

Picks up the 1-byte chunk-done marker the ingest workers write. Co-Authored-By: Claude <noreply@anthropic.com>

Merge pull request #534 from CAVEconnectome/akhilesh/sv-splitting-locks

3918789

Supervoxel splitting with base+fork support and locks

docs(readme): document the release workflow

3c00b0b

Co-Authored-By: Claude <noreply@anthropic.com>

test: configure pytest discovery in pyproject

99943c4

Add [tool.pytest.ini_options] with testpaths so a bare `pytest` collects the suite; test configuration previously lived only in the tox command. Co-Authored-By: Claude <noreply@anthropic.com>

test: drop unused gen_graph_with_edges fixture

d1123b1

The fixture duplicated gen_graph with file-backed edge/component I/O but had no callers. Co-Authored-By: Claude <noreply@anthropic.com>

test: build_graph spec uses named SV coordinates

e08bbe2

Supervoxels are named and given as SV(x, y, z, seg) with zeros defaulting away, so build specs carry no repeated raw coordinate tuples; build_graph returns a BuiltGraph(cg, sv, ts) namedtuple and callers reference sv by name.

test: build_graph returns BuiltGraph(cg, sv)

d345c14

Drop the unused ts from the result; callers unpack `cg, sv = build_graph(...)` and reference supervoxels by name.

test: group edit-operation tests into an edits package

57653c6

Move the edit-operation tests into tests/graph/edits/ with a coverage README, and lift the duplicated split/merge/undo setup graphs into shared edits/conftest.py fixtures. Co-Authored-By: Claude <noreply@anthropic.com>

test: keep stale-edge tests on a between-chunk edge

b340ac5

Restore the original finite affinity (a split test must exercise the same edge type); the migration had switched it to an inf cross-chunk edge. Co-Authored-By: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Version 3 with cached cross chunk edges#454

Version 3 with cached cross chunk edges#454
akhileshh wants to merge 364 commits into
mainfrom
pcgv3

akhileshh commented Aug 6, 2023 •

edited

Loading

Uh oh!

nkemnitz Sep 6, 2023 •

edited

Loading

Uh oh!

sdorkenw left a comment

Uh oh!

sdorkenw Sep 8, 2023 •

edited

Loading

Uh oh!

sdorkenw Sep 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

akhileshh commented Aug 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nkemnitz Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdorkenw left a comment

Choose a reason for hiding this comment

Uh oh!

sdorkenw Sep 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdorkenw Sep 8, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akhileshh commented Aug 6, 2023 •

edited

Loading

nkemnitz Sep 6, 2023 •

edited

Loading

sdorkenw Sep 8, 2023 •

edited

Loading