Skip to content

take/sparse-indexing + b2view#640

Merged
FrancescAlted merged 53 commits into
mainfrom
b2view
May 28, 2026
Merged

take/sparse-indexing + b2view#640
FrancescAlted merged 53 commits into
mainfrom
b2view

Conversation

@FrancescAlted
Copy link
Copy Markdown
Member

This PR merges two related feature branches into one:

take / sparse-indexing / fancy-indexing

  • blosc2.take() / NDArray.take(): generalized to use b2nd_get_sparse_cbuffer() from C-Blosc2, giving a unified fast path for gather operations on both NDArrays and
    CTable columns
  • CTable.take() and Column.take(): new gather API for row-position-based column access
  • Fancy indexing (arr[[1,3,5]]): routes through the same sparse-cbuffer backend
  • Sparse boolean mask fast path: auto-detects highly-selective boolean masks and routes them through take() instead of dense-materialization paths, avoiding unnecessary
    full decompression
  • where(cond, x) and where(cond, x, y): now compiled through miniexpr for JIT-accelerated evaluation
  • Boolean mask materialization: lazy-index patterns like a[a < 5][:] now use a compressed transient mask (LZ4) and a hot cache, dramatically reducing memory for
    repeated queries

Cross-cutting improvements

  • Context manager support: all blosc2.open() return types (SChunk, NDArray, C2Array, Proxy, LazyArray, CTable, DictStore, TreeStore, etc.) now support with
    blosc2.open(...) as obj:
  • CTable.where(): always uses expr_result.compute() for the boolean filter, keeping the mask compressed by default
  • On-disk query cache: side-effect correctness and mode-consistency fixes; race-condition fix for miniexpr chunk caches on Apple Silicon
  • CMake: bumped bundled C-Blosc2 version

b2view — Interactive Browser for Blosc2 Data

A new CLI viewer (blosc2 view) for interactively exploring Blosc2 objects in the terminal. Built with rich and textual, it supports browsing:

  • CTables/CStore and nested column hierarchies
  • NDArrays with multi-dimensional navigation
  • vlmeta inspection
  • Panel-based workflow with flexible dimension mode

   Instead of a fixed max_sparse_refine_candidates cutoff, estimate
   refinement cost from candidate count × operand count vs scan cost
   from total rows.  Avoids both premature fallback for large but selective
   queries and pathological refinement of near-full-table predicates.

   Constants calibrated from profiling with sparse-gather optimisations.
…. NDArray.take has a new faste path for 1d now.
  The on-disk miniexpr prefilter used a shared b2nd_array_t.chunk_cache
  buffer that was read on the fast path without holding the lock, while a
  different worker could concurrently free and replace that same buffer on
  a cache miss. This led to sporadic SIGSEGV crashes on Apple Silicon,
  where the weaker memory model and timing made the race visible much more
  often. In-memory arrays were unaffected because they bypass this path.

  A previous workaround used per-thread caches, which avoided the crash but
  made every worker fetch/copy the same on-disk chunk independently. That
  fixed correctness at the cost of much higher sys time, memory use, and
  overall runtime.

  Replace the shared mutable b2nd_array_t.chunk_cache use in miniexpr
  with a per-input shared cache owned by me_udata. Each cache entry has
  a small state machine (EMPTY, LOADING, READY, ERROR) plus a lock.
  The first worker reaching a chunk marks it LOADING, fetches and copies
  the chunk once, then publishes it as READY; the remaining workers wait
  briefly and reuse the same immutable chunk buffer. This preserves safe
  lifetime and restores chunk sharing without duplicated I/O.

  Also free SChunk with the GIL held again so threadpool teardown cannot
  race with active miniexpr workers during deallocation.

  Add a persisted regression test covering repeated a[a < 5][:] on a
  disk-backed array under multi-threaded execution.
   - keep the miniexpr shared chunk-cache fix and replace yield-based waiting
     with a blocking lock handoff for safer contention behavior
   - pass the requested open mode into reopened NDArray wrappers
   - make vlmeta derive access state from its parent SChunk instead of keeping
     an independent mode snapshot
   - break the new vlmeta->SChunk reference cycle with a weak reference
   - make query-result caching hot-cache-only and stop persisting query cache
     catalogs or __query_cache__ sidecars in any open mode
   - document the no-hidden-writes rule in blosc2.open
   - preserve _from_schunk mode/storage state in EmbedStore
   - stop upgrading reopened Proxy caches/sources to append mode implicitly
   - keep read-only Proxy opens observational by falling back to source reads
     when a missing chunk would otherwise require mutating the cache
   - update tests for open-mode propagation, read-only metadata behavior,
     hot-cache-only query reuse, and read-only proxy reopening
@FrancescAlted FrancescAlted merged commit 7c7dccd into main May 28, 2026
17 checks passed
@FrancescAlted FrancescAlted deleted the b2view branch May 28, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant