Skip to content

Support NumPy 2.x and NiBabel 5.x#137

Merged
KenyaOtsuka merged 9 commits into
KamitaniLab:devfrom
KenyaOtsuka:chore/numpy-2x-nibabel-support
Jun 24, 2026
Merged

Support NumPy 2.x and NiBabel 5.x#137
KenyaOtsuka merged 9 commits into
KamitaniLab:devfrom
KenyaOtsuka:chore/numpy-2x-nibabel-support

Conversation

@KenyaOtsuka

@KenyaOtsuka KenyaOtsuka commented Jun 17, 2026

Copy link
Copy Markdown

Summary

Fixes #106. Closes #121.

This PR updates bdpy to work with newer NumPy / NiBabel stacks by relaxing the overly strict "bdpy[mri]" dependency pins, replacing removed MRI APIs, and removing "hdf5storage.loadmat" from the relevant load paths.

Changes

  • Relax dependency constraints:

    • split the core NumPy dependency by Python version:
      • "numpy>=1.20,<2" for Python 3.8/3.9
      • "numpy>=1.20" for Python >= 3.10
    • remove the "bdpy[mri]" "numpy<1.24" constraint
    • change "nibabel==3.2" to "nibabel>=3.2,<6"
    • require "nipy>=0.6.1"
    • require a NumPy-2-capable "hdf5storage>=0.2.0" for Python >= 3.10 (legacy Python < 3.10 keeps the unconstrained hdf5storage)
  • Update MRI-related code for newer dependency versions:

    • replace removed "get_data()" calls with "get_fdata()"
    • avoid an unnecessary full-volume read in "_check_xyz"
    • replace remaining Python 2-only code paths ("xrange", "reader.next()")
  • Replace v7.3 ".mat" load paths with an h5py-based reader:

    • v5 ".mat" files still use "scipy.io.loadmat"
    • v7.3 ".mat" files now fall back to h5py instead of "hdf5storage.loadmat"
    • this is applied to feature, datastore, and sparse-array loading
    • the reader preserves MATLAB-written empty-array shapes when "MATLAB_empty" is present without hdf5storage-specific "Python.Shape" metadata
  • Add a NumPy-2-only h5py writer for "SparseArray.save":

    • NumPy < 2.0 keeps the existing "hdf5storage.savemat" behavior
    • NumPy >= 2.0 replaces only the target key/group with h5py
    • unrelated top-level variables in the same file are preserved
  • Address review feedback:

    • avoid materializing dense datasets just to dispatch "load_array"
    • make the v7.3 feature-loading tests deterministic and explicitly write v7.3 files
    • handle "MATLAB_empty" separately from "Python.Empty" in the h5py reader
  • Add regression coverage for:

    • v7.3 ".mat" loading through the h5py reader
    • MATLAB-style empty arrays with "MATLAB_empty" but no "Python.Shape"
    • NumPy-2 sparse save preserving unrelated top-level variables
    • MRI loading through the updated "get_fdata()" path

Scope note

This PR does not replace all uses of "hdf5storage" with h5py.

The h5py replacement is limited to the v7.3 ".mat" load paths discussed in #106 and to the "SparseArray.save" path where an actual NumPy 2.x save-side failure was observed. Rewriting all save paths to h5py would require reimplementing more of the MATLAB-v7.3 metadata/compatibility behavior currently handled by "hdf5storage", so I kept that out of scope. Dense save paths therefore still rely on "hdf5storage.savemat".

Because dense save paths still rely on "hdf5storage.savemat", NumPy 2.x support is limited to Python >= 3.10. Python 3.8/3.9 keep using NumPy < 2.

I left the other currently unconstrained core dependency lower bounds ("scipy", "scikit-learn", "pandas", etc.) unchanged because they do not appear to conflict with the current Python support floor.

Testing

"pytest tests/dataform/test_mat_v73.py tests/dataform/test_sparse.py tests/dataform/test_features.py"

  • Python 3.11.15 / NumPy 1.23.5 / NiBabel 3.2.0 / nipy 0.6.1 / scipy 1.15.3 / h5py 3.16.0

    • 8 passed, 1 skipped
  • Python 3.11.15 / NumPy 2.4.6 / NiBabel 5.4.2 / nipy 0.6.1 / scipy 1.17.1 / h5py 3.16.0

    • 9 passed

"pytest tests/dataform tests/test_mri.py"

  • Python 3.11.15 / NumPy 1.23.5 / NiBabel 3.2.0 / nipy 0.6.1 / scipy 1.15.3 / h5py 3.16.0

    • 18 passed, 1 skipped
  • Python 3.11.15 / NumPy 2.4.6 / NiBabel 5.4.2 / nipy 0.6.1 / scipy 1.17.1 / h5py 3.16.0

    • 19 passed

Note: the remaining Python 3.11 FastL2LiR CI failure appears to be caused by a PyFastL2LiR NumPy 2.4 compatibility issue, not by this PR. It matches the upstream fix in KamitaniLab/PyFastL2LiR@2eef75d, which has not been released yet.

Drop the numpy<1.24 cap and the exact nibabel==3.2 pin that blocked
contemporary NumPy and NiBabel (issue KamitaniLab#121). nipy>=0.6.1 is required for
NumPy 2.0 support.

- mri: numpy<1.24 removed (inherits core numpy>=1.20); nibabel==3.2 ->
  nibabel>=3.2,<6 (6.0 drops NumPy 1.x); nipy>=0.6.0 -> nipy>=0.6.1
NiBabel 5.0 removed Image.get_data(); read nipy images via get_fdata()
instead in load_mri, load_epi and fmriprep. Also fix Python 2 leftovers
(xrange -> range, reader.next() -> next(reader)) that would fail on
Python 3. Add a load_mri round-trip test guarding the get_fdata() path.
…iLab#106)

Under NumPy 2.0 hdf5storage.loadmat fails (it referenced the removed
np.unicode_), breaking DecodedFeatures.get() and other loaders. Add
bdpy/dataform/_mat_v73.py with an h5py-based reader for the dense and
sparse-struct layouts bdpy uses, plus a loadmat_key() helper that keeps
v5 .mat support via scipy and falls back to h5py for v7.3. Route the load
paths in features, datastore and sparse through it. Saving still uses
hdf5storage.

SparseArray.save now retries with a from-scratch rewrite only if the
in-place hdf5storage.savemat raises under NumPy 2.0 (overwriting a struct);
the normal merging behavior is preserved whenever the write succeeds.
Update the features test to write v7.3 mock data so the h5py path is
exercised.
Under NumPy 2.x, hdf5storage.savemat can fail when SparseArray.save
overwrites an existing sparse struct, and the previous fallback deleted the
whole file before rewriting, dropping any unrelated variables.

Make SparseArray.save use a NumPy-2-only h5py writer that opens the target
in append mode and replaces only the requested key/group, leaving other
variables intact. NumPy < 2.0 keeps the existing hdf5storage.savemat path.
The dense save paths are unchanged and still rely on hdf5storage for the
MATLAB-v7.3 metadata/compatibility behavior.

Add a NumPy-2-only test asserting that SparseArray.save preserves other
top-level variables already stored in the target file.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates BdPy’s MRI and .mat read/write paths to work with modern NumPy (2.x) and NiBabel (5.x), primarily by removing reliance on hdf5storage.loadmat for v7.3 reads and replacing deprecated MRI image APIs.

Changes:

  • Loosen bdpy[mri] dependency pins to allow newer NiBabel/NiPy stacks.
  • Replace deprecated MRI image reads (get_data() / Python2-era constructs) with modern equivalents.
  • Add an h5py-based MATLAB v7.3 (.mat) reader and route feature/datastore/sparse loads through it; add NumPy-2-only sparse save behavior and regression tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Relaxes mri extra dependency constraints (NiBabel/NiPy).
bdpy/mri/load_mri.py Uses get_fdata() for compatibility with newer imaging stacks.
bdpy/mri/load_epi.py Switches to get_fdata() for EPI loading.
bdpy/mri/glm.py Replaces Python 2-style reader.next() with next(reader).
bdpy/mri/fmriprep.py Updates deprecated image API usage and removes xrange.
bdpy/dataform/_mat_v73.py Introduces an h5py-based reader for MATLAB v7.3 .mat files.
bdpy/dataform/features.py Routes .mat variable loads through the new v5/v7.3-aware loader.
bdpy/dataform/datastore.py Replaces hdf5storage.loadmat reads with the new v5/v7.3-aware loader.
bdpy/dataform/sparse.py Uses h5py-based reads; adds NumPy-2-only sparse-struct writer logic.
tests/test_mri.py Adds regression test for 3D MRI loading via get_fdata().
tests/dataform/test_sparse.py Adds NumPy-2-only test ensuring sparse save preserves other top-level vars.
tests/dataform/test_features.py Updates test fixture generation to align with the v7.3 loading path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bdpy/dataform/sparse.py Outdated
Comment on lines 32 to 36
elif type(f[key][()]) == np.ndarray:
# Dense array
return hdf5storage.loadmat(fname)[key]
# Dense array (read with h5py; hdf5storage breaks under NumPy 2.0)
return _mat_v73.read_dataset(f[key])
else:
raise RuntimeError('Unsupported data type: %s' % type(f[key][()]))

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f123d47.

Comment thread bdpy/mri/load_epi.py
Comment on lines 48 to 51
def _check_xyz(xyz, img):
"""Check voxel xyz consistency."""
xyz_current = _get_xyz(img.coordmap.affine, img.get_data().shape)
xyz_current = _get_xyz(img.coordmap.affine, img.get_fdata().shape)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f123d47.

Comment thread tests/dataform/test_features.py Outdated
Comment on lines 31 to 35
for image_name in mock_image_names:
data = np.random.rand(*shape)
hdf5storage.savemat(
os.path.join(tmpdir, layer_name, image_name + '.mat'),
{'feat': data},

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 887258b.

Dense save paths still use hdf5storage.savemat, which can fail under NumPy 2.x
(it references the removed np.unicode_) on the legacy Python 3.8/3.9 stacks.
Add environment markers so NumPy 2.x is only resolved on Python >= 3.10, while
3.8/3.9 stay on a NumPy 1.x-compatible stack.
load_array inspected the HDF5 object by reading the whole dataset (f[key][()])
just to branch on dense vs sparse, which also misclassified scalar datasets.
Branch on the HDF5 object type instead (group = SparseArray, dataset = dense).

_check_xyz called img.get_fdata().shape, forcing a full per-file volume read
only to check XYZ consistency. Use img.shape[:3] metadata instead; load_epi
still reads the actual data afterwards.
Pass format='7.3' explicitly to hdf5storage.savemat so the h5py/v7.3 loader
path is always exercised instead of relying on the default, and stack the
expected arrays in sorted-filename order to match Features.__get_labels.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Comment thread bdpy/dataform/_mat_v73.py Outdated
Comment on lines +38 to +40
if "Python.Empty" in attrs or "MATLAB_empty" in attrs:
shape = tuple(int(x) for x in attrs.get("Python.Shape", ()))
return np.empty(shape, dtype=dset.dtype)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 5fee425, with a regression test (tests/dataform/test_mat_v73.py) covering MATLAB_empty without Python.Shape.

MATLAB-written v7.3 files mark empty arrays with MATLAB_empty but, unlike
hdf5storage, do not store Python.Shape. Treating MATLAB_empty like
Python.Empty made read_dataset fall back to Python.Shape=() and return
np.empty(()), collapsing an empty non-scalar array such as (0, 3) to a 0-d
array.

Special-case only Python.Empty (falling back to the stored dataset shape if
Python.Shape is missing); a bare MATLAB_empty now goes through the normal
read/transpose path, preserving the empty array's shape.

Add a regression test covering a MATLAB_empty dataset without Python.Shape.
@KenyaOtsuka KenyaOtsuka requested a review from ganow June 18, 2026 00:40
@KenyaOtsuka KenyaOtsuka marked this pull request as ready for review June 18, 2026 00:40

@HirokiYasuda03 HirokiYasuda03 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KenyaOtsuka
Thank you for the NumPy 2.x / NiBabel 5.x support. I’ve made just one comment, so please take a look at it. Other than that, I don’t think there are any issues.

The following is just a low-level question about future direction, separate from merging (not something I'm asking you to address in this PR).

After this PR, the dense save paths (save_array(sparse=False) / save_multiarrays / save_feature) still go through hdf5storage, so the hdf5storage dependency itself remains. I'm a little wary of that because hdf5storage is updated very infrequently (0.1.19 in 2023-01, then a ~2-year gap until the NumPy 2.0-capable 0.2.0 in 2025-05). I'd like to ask whether the eventual direction is to move dense over to h5py as well and fully migrate, or to keep relying on hdf5storage.

I understand that the reason the implementation of saving dense format by h5py is currently costly is because we need to maintain compatibility with MATLAB.
If MATLAB interoperability is a requirement, keeping dense on hdf5storage seems natural.
If everything stays within bdpy and MATLAB interop isn't needed In the subsequent pipeline, then writing the dense contents directly with h5py should be straightforward in principle, so eventually migrating fully and dropping the hdf5storage dependency might be worth considering.
Again, this is just a side question out of curiosity, not a blocker.

Comment thread pyproject.toml Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dense save paths (save_array(sparse=False) / save_multiarrays / save_feature) still call hdf5storage.savemat, and pyproject.toml lists hdf5storage with no version floor. The problem is that hdf5storage < 0.2.0 is not NumPy-2-compatible: np.unicode_ was removed in NumPy 2.0, and hdf5storage references it inside MarshallerCollection.init, which is constructed on every savemat/loadmat call (via Options()), not just on read (AI confirmed).

Suggestion: make the dependency contract explicit by adding a floor for the NumPy-2-capable Python range, e.g.:

Suggested change
"hdf5storage>=0.2.0; python_version >= '3.10'",

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3053edb.

The dense save paths (save_array(sparse=False), save_multiarrays,
save_feature) still call hdf5storage.savemat. hdf5storage < 0.2.0 is not
NumPy-2 compatible: it references np.unicode_ (removed in NumPy 2.0) inside
MarshallerCollection, which is constructed on every savemat/loadmat call via
Options(), so the failure is not limited to the read path.

Make the contract explicit, matching the existing NumPy split: keep an
unconstrained hdf5storage for legacy Python (<3.10, which stays on NumPy 1.x)
and require hdf5storage>=0.2.0 for Python >= 3.10, where NumPy 2.x is allowed.

@HirokiYasuda03 HirokiYasuda03 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank your for changes.
I think we can merge.

@KenyaOtsuka

Copy link
Copy Markdown
Author

@HirokiYasuda03
Thank you for the review!

Regarding the future direction, my current thinking is that, in the long term, it would be preferable to remove the dependency on hdf5storage.

At the same time, I do not think we should simply drop compatibility with existing files or MATLAB interoperability, because existing bdpy data has been written through hdf5storage. There seem to be at least two possible migration paths. One option is to keep legacy read support for files written by hdf5storage, as in this PR, by maintaining our own reader implementation. Another option is to provide a conversion utility and eventually deprecate the legacy hdf5storage-based format after a transition period.

Personally, I slightly prefer the latter direction, because I would rather avoid maintaining multiple dense formats indefinitely. However, if the maintenance cost of legacy read support is not too high, the former may also be a reasonable choice.

@KenyaOtsuka KenyaOtsuka merged commit 8b5e690 into KamitaniLab:dev Jun 24, 2026
3 of 6 checks passed
@KenyaOtsuka KenyaOtsuka deleted the chore/numpy-2x-nibabel-support branch June 24, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants