Support NumPy 2.x and NiBabel 5.x by KenyaOtsuka · Pull Request #137 · KamitaniLab/bdpy

KenyaOtsuka · 2026-06-17T02:11:56Z

Summary

Fixes #106. Closes #121.

This PR updates bdpy to work with newer NumPy / NiBabel stacks by relaxing the overly strict "bdpy[mri]" dependency pins, replacing removed MRI APIs, and removing "hdf5storage.loadmat" from the relevant load paths.

Changes

Relax dependency constraints:
- split the core NumPy dependency by Python version:
  - "numpy>=1.20,<2" for Python 3.8/3.9
  - "numpy>=1.20" for Python >= 3.10
- remove the "bdpy[mri]" "numpy<1.24" constraint
- change "nibabel==3.2" to "nibabel>=3.2,<6"
- require "nipy>=0.6.1"
- require a NumPy-2-capable "hdf5storage>=0.2.0" for Python >= 3.10 (legacy Python < 3.10 keeps the unconstrained hdf5storage)
Update MRI-related code for newer dependency versions:
- replace removed "get_data()" calls with "get_fdata()"
- avoid an unnecessary full-volume read in "_check_xyz"
- replace remaining Python 2-only code paths ("xrange", "reader.next()")
Replace v7.3 ".mat" load paths with an h5py-based reader:
- v5 ".mat" files still use "scipy.io.loadmat"
- v7.3 ".mat" files now fall back to h5py instead of "hdf5storage.loadmat"
- this is applied to feature, datastore, and sparse-array loading
- the reader preserves MATLAB-written empty-array shapes when "MATLAB_empty" is present without hdf5storage-specific "Python.Shape" metadata
Add a NumPy-2-only h5py writer for "SparseArray.save":
- NumPy < 2.0 keeps the existing "hdf5storage.savemat" behavior
- NumPy >= 2.0 replaces only the target key/group with h5py
- unrelated top-level variables in the same file are preserved
Address review feedback:
- avoid materializing dense datasets just to dispatch "load_array"
- make the v7.3 feature-loading tests deterministic and explicitly write v7.3 files
- handle "MATLAB_empty" separately from "Python.Empty" in the h5py reader
Add regression coverage for:
- v7.3 ".mat" loading through the h5py reader
- MATLAB-style empty arrays with "MATLAB_empty" but no "Python.Shape"
- NumPy-2 sparse save preserving unrelated top-level variables
- MRI loading through the updated "get_fdata()" path

Scope note

This PR does not replace all uses of "hdf5storage" with h5py.

The h5py replacement is limited to the v7.3 ".mat" load paths discussed in #106 and to the "SparseArray.save" path where an actual NumPy 2.x save-side failure was observed. Rewriting all save paths to h5py would require reimplementing more of the MATLAB-v7.3 metadata/compatibility behavior currently handled by "hdf5storage", so I kept that out of scope. Dense save paths therefore still rely on "hdf5storage.savemat".

Because dense save paths still rely on "hdf5storage.savemat", NumPy 2.x support is limited to Python >= 3.10. Python 3.8/3.9 keep using NumPy < 2.

I left the other currently unconstrained core dependency lower bounds ("scipy", "scikit-learn", "pandas", etc.) unchanged because they do not appear to conflict with the current Python support floor.

Testing

"pytest tests/dataform/test_mat_v73.py tests/dataform/test_sparse.py tests/dataform/test_features.py"

Python 3.11.15 / NumPy 1.23.5 / NiBabel 3.2.0 / nipy 0.6.1 / scipy 1.15.3 / h5py 3.16.0
- 8 passed, 1 skipped
Python 3.11.15 / NumPy 2.4.6 / NiBabel 5.4.2 / nipy 0.6.1 / scipy 1.17.1 / h5py 3.16.0
- 9 passed

"pytest tests/dataform tests/test_mri.py"

Python 3.11.15 / NumPy 1.23.5 / NiBabel 3.2.0 / nipy 0.6.1 / scipy 1.15.3 / h5py 3.16.0
- 18 passed, 1 skipped
Python 3.11.15 / NumPy 2.4.6 / NiBabel 5.4.2 / nipy 0.6.1 / scipy 1.17.1 / h5py 3.16.0
- 19 passed

Note: the remaining Python 3.11 FastL2LiR CI failure appears to be caused by a PyFastL2LiR NumPy 2.4 compatibility issue, not by this PR. It matches the upstream fix in KamitaniLab/PyFastL2LiR@2eef75d, which has not been released yet.

Drop the numpy<1.24 cap and the exact nibabel==3.2 pin that blocked contemporary NumPy and NiBabel (issue KamitaniLab#121). nipy>=0.6.1 is required for NumPy 2.0 support. - mri: numpy<1.24 removed (inherits core numpy>=1.20); nibabel==3.2 -> nibabel>=3.2,<6 (6.0 drops NumPy 1.x); nipy>=0.6.0 -> nipy>=0.6.1

NiBabel 5.0 removed Image.get_data(); read nipy images via get_fdata() instead in load_mri, load_epi and fmriprep. Also fix Python 2 leftovers (xrange -> range, reader.next() -> next(reader)) that would fail on Python 3. Add a load_mri round-trip test guarding the get_fdata() path.

…iLab#106) Under NumPy 2.0 hdf5storage.loadmat fails (it referenced the removed np.unicode_), breaking DecodedFeatures.get() and other loaders. Add bdpy/dataform/_mat_v73.py with an h5py-based reader for the dense and sparse-struct layouts bdpy uses, plus a loadmat_key() helper that keeps v5 .mat support via scipy and falls back to h5py for v7.3. Route the load paths in features, datastore and sparse through it. Saving still uses hdf5storage. SparseArray.save now retries with a from-scratch rewrite only if the in-place hdf5storage.savemat raises under NumPy 2.0 (overwriting a struct); the normal merging behavior is preserved whenever the write succeeds. Update the features test to write v7.3 mock data so the h5py path is exercised.

Under NumPy 2.x, hdf5storage.savemat can fail when SparseArray.save overwrites an existing sparse struct, and the previous fallback deleted the whole file before rewriting, dropping any unrelated variables. Make SparseArray.save use a NumPy-2-only h5py writer that opens the target in append mode and replaces only the requested key/group, leaving other variables intact. NumPy < 2.0 keeps the existing hdf5storage.savemat path. The dense save paths are unchanged and still rely on hdf5storage for the MATLAB-v7.3 metadata/compatibility behavior. Add a NumPy-2-only test asserting that SparseArray.save preserves other top-level variables already stored in the target file.

Copilot

Pull request overview

This PR updates BdPy’s MRI and .mat read/write paths to work with modern NumPy (2.x) and NiBabel (5.x), primarily by removing reliance on hdf5storage.loadmat for v7.3 reads and replacing deprecated MRI image APIs.

Changes:

Loosen bdpy[mri] dependency pins to allow newer NiBabel/NiPy stacks.
Replace deprecated MRI image reads (get_data() / Python2-era constructs) with modern equivalents.
Add an h5py-based MATLAB v7.3 (.mat) reader and route feature/datastore/sparse loads through it; add NumPy-2-only sparse save behavior and regression tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`pyproject.toml`	Relaxes `mri` extra dependency constraints (NiBabel/NiPy).
`bdpy/mri/load_mri.py`	Uses `get_fdata()` for compatibility with newer imaging stacks.
`bdpy/mri/load_epi.py`	Switches to `get_fdata()` for EPI loading.
`bdpy/mri/glm.py`	Replaces Python 2-style `reader.next()` with `next(reader)`.
`bdpy/mri/fmriprep.py`	Updates deprecated image API usage and removes `xrange`.
`bdpy/dataform/_mat_v73.py`	Introduces an h5py-based reader for MATLAB v7.3 `.mat` files.
`bdpy/dataform/features.py`	Routes `.mat` variable loads through the new v5/v7.3-aware loader.
`bdpy/dataform/datastore.py`	Replaces `hdf5storage.loadmat` reads with the new v5/v7.3-aware loader.
`bdpy/dataform/sparse.py`	Uses h5py-based reads; adds NumPy-2-only sparse-struct writer logic.
`tests/test_mri.py`	Adds regression test for 3D MRI loading via `get_fdata()`.
`tests/dataform/test_sparse.py`	Adds NumPy-2-only test ensuring sparse save preserves other top-level vars.
`tests/dataform/test_features.py`	Updates test fixture generation to align with the v7.3 loading path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

KenyaOtsuka · 2026-06-17T04:27:05Z

        elif type(f[key][()]) == np.ndarray:
-            # Dense array
-            return hdf5storage.loadmat(fname)[key]
+            # Dense array (read with h5py; hdf5storage breaks under NumPy 2.0)
+            return _mat_v73.read_dataset(f[key])
        else:
            raise RuntimeError('Unsupported data type: %s' % type(f[key][()]))


Addressed in f123d47.

KenyaOtsuka · 2026-06-17T04:27:12Z

 def _check_xyz(xyz, img):
    """Check voxel xyz consistency."""
-    xyz_current = _get_xyz(img.coordmap.affine, img.get_data().shape)
+    xyz_current = _get_xyz(img.coordmap.affine, img.get_fdata().shape)



Addressed in f123d47.

KenyaOtsuka · 2026-06-17T04:27:40Z

        for image_name in mock_image_names:
            data = np.random.rand(*shape)
            hdf5storage.savemat(
                os.path.join(tmpdir, layer_name, image_name + '.mat'),
                {'feat': data},


Addressed in 887258b.

Dense save paths still use hdf5storage.savemat, which can fail under NumPy 2.x (it references the removed np.unicode_) on the legacy Python 3.8/3.9 stacks. Add environment markers so NumPy 2.x is only resolved on Python >= 3.10, while 3.8/3.9 stay on a NumPy 1.x-compatible stack.

load_array inspected the HDF5 object by reading the whole dataset (f[key][()]) just to branch on dense vs sparse, which also misclassified scalar datasets. Branch on the HDF5 object type instead (group = SparseArray, dataset = dense). _check_xyz called img.get_fdata().shape, forcing a full per-file volume read only to check XYZ consistency. Use img.shape[:3] metadata instead; load_epi still reads the actual data afterwards.

Pass format='7.3' explicitly to hdf5storage.savemat so the h5py/v7.3 loader path is always exercised instead of relying on the default, and stack the expected arrays in sorted-filename order to match Features.__get_labels.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

KenyaOtsuka · 2026-06-17T07:41:44Z

+    if "Python.Empty" in attrs or "MATLAB_empty" in attrs:
+        shape = tuple(int(x) for x in attrs.get("Python.Shape", ()))
+        return np.empty(shape, dtype=dset.dtype)


Addressed in 5fee425, with a regression test (tests/dataform/test_mat_v73.py) covering MATLAB_empty without Python.Shape.

MATLAB-written v7.3 files mark empty arrays with MATLAB_empty but, unlike hdf5storage, do not store Python.Shape. Treating MATLAB_empty like Python.Empty made read_dataset fall back to Python.Shape=() and return np.empty(()), collapsing an empty non-scalar array such as (0, 3) to a 0-d array. Special-case only Python.Empty (falling back to the stored dataset shape if Python.Shape is missing); a bare MATLAB_empty now goes through the normal read/transpose path, preserving the empty array's shape. Add a regression test covering a MATLAB_empty dataset without Python.Shape.

HirokiYasuda03

@KenyaOtsuka
Thank you for the NumPy 2.x / NiBabel 5.x support. I’ve made just one comment, so please take a look at it. Other than that, I don’t think there are any issues.

The following is just a low-level question about future direction, separate from merging (not something I'm asking you to address in this PR).

After this PR, the dense save paths (save_array(sparse=False) / save_multiarrays / save_feature) still go through hdf5storage, so the hdf5storage dependency itself remains. I'm a little wary of that because hdf5storage is updated very infrequently (0.1.19 in 2023-01, then a ~2-year gap until the NumPy 2.0-capable 0.2.0 in 2025-05). I'd like to ask whether the eventual direction is to move dense over to h5py as well and fully migrate, or to keep relying on hdf5storage.

I understand that the reason the implementation of saving dense format by h5py is currently costly is because we need to maintain compatibility with MATLAB.
If MATLAB interoperability is a requirement, keeping dense on hdf5storage seems natural.
If everything stays within bdpy and MATLAB interop isn't needed In the subsequent pipeline, then writing the dense contents directly with h5py should be straightforward in principle, so eventually migrating fully and dropping the hdf5storage dependency might be worth considering.
Again, this is just a side question out of curiosity, not a blocker.

HirokiYasuda03 · 2026-06-24T04:00:27Z

The dense save paths (save_array(sparse=False) / save_multiarrays / save_feature) still call hdf5storage.savemat, and pyproject.toml lists hdf5storage with no version floor. The problem is that hdf5storage < 0.2.0 is not NumPy-2-compatible: np.unicode_ was removed in NumPy 2.0, and hdf5storage references it inside MarshallerCollection.init, which is constructed on every savemat/loadmat call (via Options()), not just on read (AI confirmed).

Suggestion: make the dependency contract explicit by adding a floor for the NumPy-2-capable Python range, e.g.:

Suggested change

"hdf5storage>=0.2.0; python_version >= '3.10'",

Addressed in 3053edb.

The dense save paths (save_array(sparse=False), save_multiarrays, save_feature) still call hdf5storage.savemat. hdf5storage < 0.2.0 is not NumPy-2 compatible: it references np.unicode_ (removed in NumPy 2.0) inside MarshallerCollection, which is constructed on every savemat/loadmat call via Options(), so the failure is not limited to the read path. Make the contract explicit, matching the existing NumPy split: keep an unconstrained hdf5storage for legacy Python (<3.10, which stays on NumPy 1.x) and require hdf5storage>=0.2.0 for Python >= 3.10, where NumPy 2.x is allowed.

HirokiYasuda03

Thank your for changes.
I think we can merge.

KenyaOtsuka · 2026-06-24T07:29:03Z

@HirokiYasuda03
Thank you for the review!

Regarding the future direction, my current thinking is that, in the long term, it would be preferable to remove the dependency on hdf5storage.

At the same time, I do not think we should simply drop compatibility with existing files or MATLAB interoperability, because existing bdpy data has been written through hdf5storage. There seem to be at least two possible migration paths. One option is to keep legacy read support for files written by hdf5storage, as in this PR, by maintaining our own reader implementation. Another option is to provide a conversion utility and eventually deprecate the legacy hdf5storage-based format after a transition period.

Personally, I slightly prefer the latter direction, because I would rather avoid maintaining multiple dense formats indefinitely. However, if the maintenance cost of legacy read support is not too high, the former may also be a reasonable choice.

KenyaOtsuka added 4 commits June 16, 2026 14:07

KenyaOtsuka requested a review from Copilot June 17, 2026 02:11

Copilot started reviewing on behalf of KenyaOtsuka June 17, 2026 02:12 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

KenyaOtsuka added 3 commits June 17, 2026 04:10

test: make v7.3 feature loader test deterministic

887258b

Pass format='7.3' explicitly to hdf5storage.savemat so the h5py/v7.3 loader path is always exercised instead of relying on the default, and stack the expected arrays in sorted-filename order to match Features.__get_labels.

KenyaOtsuka requested a review from Copilot June 17, 2026 04:15

Copilot started reviewing on behalf of KenyaOtsuka June 17, 2026 04:15 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

KenyaOtsuka requested a review from ganow June 18, 2026 00:40

KenyaOtsuka marked this pull request as ready for review June 18, 2026 00:40

KenyaOtsuka requested a review from HirokiYasuda03 June 23, 2026 05:49

HirokiYasuda03 reviewed Jun 24, 2026

View reviewed changes

HirokiYasuda03 approved these changes Jun 24, 2026

View reviewed changes

KenyaOtsuka merged commit 8b5e690 into KamitaniLab:dev Jun 24, 2026
3 of 6 checks passed

This was referenced Jun 24, 2026

DecodedFeatures.get() does not works for numpy>=2.0 #106

Closed

Loosen pinned dependency versions (numpy, nibabel) and support numpy 2.x #121

Closed

KenyaOtsuka deleted the chore/numpy-2x-nibabel-support branch June 24, 2026 08:12

KenyaOtsuka mentioned this pull request Jun 29, 2026

Drop the hdf5storage dependency (write-side breaking change) #141

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support NumPy 2.x and NiBabel 5.x#137

Support NumPy 2.x and NiBabel 5.x#137
KenyaOtsuka merged 9 commits into
KamitaniLab:devfrom
KenyaOtsuka:chore/numpy-2x-nibabel-support

KenyaOtsuka commented Jun 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

KenyaOtsuka Jun 17, 2026

Uh oh!

KenyaOtsuka Jun 17, 2026

Uh oh!

KenyaOtsuka Jun 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

KenyaOtsuka Jun 17, 2026

Uh oh!

HirokiYasuda03 left a comment •

edited

Loading

Uh oh!

HirokiYasuda03 Jun 24, 2026

Uh oh!

KenyaOtsuka Jun 24, 2026

Uh oh!

HirokiYasuda03 left a comment

Uh oh!

KenyaOtsuka commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

KenyaOtsuka commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

KenyaOtsuka Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

KenyaOtsuka Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

KenyaOtsuka Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

KenyaOtsuka Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

HirokiYasuda03 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HirokiYasuda03 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

KenyaOtsuka Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

HirokiYasuda03 left a comment

Choose a reason for hiding this comment

Uh oh!

KenyaOtsuka commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KenyaOtsuka commented Jun 17, 2026 •

edited

Loading

HirokiYasuda03 left a comment •

edited

Loading