Support NumPy 2.x and NiBabel 5.x#137
Conversation
Drop the numpy<1.24 cap and the exact nibabel==3.2 pin that blocked contemporary NumPy and NiBabel (issue KamitaniLab#121). nipy>=0.6.1 is required for NumPy 2.0 support. - mri: numpy<1.24 removed (inherits core numpy>=1.20); nibabel==3.2 -> nibabel>=3.2,<6 (6.0 drops NumPy 1.x); nipy>=0.6.0 -> nipy>=0.6.1
NiBabel 5.0 removed Image.get_data(); read nipy images via get_fdata() instead in load_mri, load_epi and fmriprep. Also fix Python 2 leftovers (xrange -> range, reader.next() -> next(reader)) that would fail on Python 3. Add a load_mri round-trip test guarding the get_fdata() path.
…iLab#106) Under NumPy 2.0 hdf5storage.loadmat fails (it referenced the removed np.unicode_), breaking DecodedFeatures.get() and other loaders. Add bdpy/dataform/_mat_v73.py with an h5py-based reader for the dense and sparse-struct layouts bdpy uses, plus a loadmat_key() helper that keeps v5 .mat support via scipy and falls back to h5py for v7.3. Route the load paths in features, datastore and sparse through it. Saving still uses hdf5storage. SparseArray.save now retries with a from-scratch rewrite only if the in-place hdf5storage.savemat raises under NumPy 2.0 (overwriting a struct); the normal merging behavior is preserved whenever the write succeeds. Update the features test to write v7.3 mock data so the h5py path is exercised.
Under NumPy 2.x, hdf5storage.savemat can fail when SparseArray.save overwrites an existing sparse struct, and the previous fallback deleted the whole file before rewriting, dropping any unrelated variables. Make SparseArray.save use a NumPy-2-only h5py writer that opens the target in append mode and replaces only the requested key/group, leaving other variables intact. NumPy < 2.0 keeps the existing hdf5storage.savemat path. The dense save paths are unchanged and still rely on hdf5storage for the MATLAB-v7.3 metadata/compatibility behavior. Add a NumPy-2-only test asserting that SparseArray.save preserves other top-level variables already stored in the target file.
There was a problem hiding this comment.
Pull request overview
This PR updates BdPy’s MRI and .mat read/write paths to work with modern NumPy (2.x) and NiBabel (5.x), primarily by removing reliance on hdf5storage.loadmat for v7.3 reads and replacing deprecated MRI image APIs.
Changes:
- Loosen
bdpy[mri]dependency pins to allow newer NiBabel/NiPy stacks. - Replace deprecated MRI image reads (
get_data()/ Python2-era constructs) with modern equivalents. - Add an h5py-based MATLAB v7.3 (
.mat) reader and route feature/datastore/sparse loads through it; add NumPy-2-only sparse save behavior and regression tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
pyproject.toml |
Relaxes mri extra dependency constraints (NiBabel/NiPy). |
bdpy/mri/load_mri.py |
Uses get_fdata() for compatibility with newer imaging stacks. |
bdpy/mri/load_epi.py |
Switches to get_fdata() for EPI loading. |
bdpy/mri/glm.py |
Replaces Python 2-style reader.next() with next(reader). |
bdpy/mri/fmriprep.py |
Updates deprecated image API usage and removes xrange. |
bdpy/dataform/_mat_v73.py |
Introduces an h5py-based reader for MATLAB v7.3 .mat files. |
bdpy/dataform/features.py |
Routes .mat variable loads through the new v5/v7.3-aware loader. |
bdpy/dataform/datastore.py |
Replaces hdf5storage.loadmat reads with the new v5/v7.3-aware loader. |
bdpy/dataform/sparse.py |
Uses h5py-based reads; adds NumPy-2-only sparse-struct writer logic. |
tests/test_mri.py |
Adds regression test for 3D MRI loading via get_fdata(). |
tests/dataform/test_sparse.py |
Adds NumPy-2-only test ensuring sparse save preserves other top-level vars. |
tests/dataform/test_features.py |
Updates test fixture generation to align with the v7.3 loading path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif type(f[key][()]) == np.ndarray: | ||
| # Dense array | ||
| return hdf5storage.loadmat(fname)[key] | ||
| # Dense array (read with h5py; hdf5storage breaks under NumPy 2.0) | ||
| return _mat_v73.read_dataset(f[key]) | ||
| else: | ||
| raise RuntimeError('Unsupported data type: %s' % type(f[key][()])) |
| def _check_xyz(xyz, img): | ||
| """Check voxel xyz consistency.""" | ||
| xyz_current = _get_xyz(img.coordmap.affine, img.get_data().shape) | ||
| xyz_current = _get_xyz(img.coordmap.affine, img.get_fdata().shape) | ||
|
|
| for image_name in mock_image_names: | ||
| data = np.random.rand(*shape) | ||
| hdf5storage.savemat( | ||
| os.path.join(tmpdir, layer_name, image_name + '.mat'), | ||
| {'feat': data}, |
Dense save paths still use hdf5storage.savemat, which can fail under NumPy 2.x (it references the removed np.unicode_) on the legacy Python 3.8/3.9 stacks. Add environment markers so NumPy 2.x is only resolved on Python >= 3.10, while 3.8/3.9 stay on a NumPy 1.x-compatible stack.
load_array inspected the HDF5 object by reading the whole dataset (f[key][()]) just to branch on dense vs sparse, which also misclassified scalar datasets. Branch on the HDF5 object type instead (group = SparseArray, dataset = dense). _check_xyz called img.get_fdata().shape, forcing a full per-file volume read only to check XYZ consistency. Use img.shape[:3] metadata instead; load_epi still reads the actual data afterwards.
Pass format='7.3' explicitly to hdf5storage.savemat so the h5py/v7.3 loader path is always exercised instead of relying on the default, and stack the expected arrays in sorted-filename order to match Features.__get_labels.
| if "Python.Empty" in attrs or "MATLAB_empty" in attrs: | ||
| shape = tuple(int(x) for x in attrs.get("Python.Shape", ())) | ||
| return np.empty(shape, dtype=dset.dtype) |
There was a problem hiding this comment.
Addressed in 5fee425, with a regression test (tests/dataform/test_mat_v73.py) covering MATLAB_empty without Python.Shape.
MATLAB-written v7.3 files mark empty arrays with MATLAB_empty but, unlike hdf5storage, do not store Python.Shape. Treating MATLAB_empty like Python.Empty made read_dataset fall back to Python.Shape=() and return np.empty(()), collapsing an empty non-scalar array such as (0, 3) to a 0-d array. Special-case only Python.Empty (falling back to the stored dataset shape if Python.Shape is missing); a bare MATLAB_empty now goes through the normal read/transpose path, preserving the empty array's shape. Add a regression test covering a MATLAB_empty dataset without Python.Shape.
There was a problem hiding this comment.
@KenyaOtsuka
Thank you for the NumPy 2.x / NiBabel 5.x support. I’ve made just one comment, so please take a look at it. Other than that, I don’t think there are any issues.
The following is just a low-level question about future direction, separate from merging (not something I'm asking you to address in this PR).
After this PR, the dense save paths (save_array(sparse=False) / save_multiarrays / save_feature) still go through hdf5storage, so the hdf5storage dependency itself remains. I'm a little wary of that because hdf5storage is updated very infrequently (0.1.19 in 2023-01, then a ~2-year gap until the NumPy 2.0-capable 0.2.0 in 2025-05). I'd like to ask whether the eventual direction is to move dense over to h5py as well and fully migrate, or to keep relying on hdf5storage.
I understand that the reason the implementation of saving dense format by h5py is currently costly is because we need to maintain compatibility with MATLAB.
If MATLAB interoperability is a requirement, keeping dense on hdf5storage seems natural.
If everything stays within bdpy and MATLAB interop isn't needed In the subsequent pipeline, then writing the dense contents directly with h5py should be straightforward in principle, so eventually migrating fully and dropping the hdf5storage dependency might be worth considering.
Again, this is just a side question out of curiosity, not a blocker.
There was a problem hiding this comment.
The dense save paths (save_array(sparse=False) / save_multiarrays / save_feature) still call hdf5storage.savemat, and pyproject.toml lists hdf5storage with no version floor. The problem is that hdf5storage < 0.2.0 is not NumPy-2-compatible: np.unicode_ was removed in NumPy 2.0, and hdf5storage references it inside MarshallerCollection.init, which is constructed on every savemat/loadmat call (via Options()), not just on read (AI confirmed).
Suggestion: make the dependency contract explicit by adding a floor for the NumPy-2-capable Python range, e.g.:
| "hdf5storage>=0.2.0; python_version >= '3.10'", |
The dense save paths (save_array(sparse=False), save_multiarrays, save_feature) still call hdf5storage.savemat. hdf5storage < 0.2.0 is not NumPy-2 compatible: it references np.unicode_ (removed in NumPy 2.0) inside MarshallerCollection, which is constructed on every savemat/loadmat call via Options(), so the failure is not limited to the read path. Make the contract explicit, matching the existing NumPy split: keep an unconstrained hdf5storage for legacy Python (<3.10, which stays on NumPy 1.x) and require hdf5storage>=0.2.0 for Python >= 3.10, where NumPy 2.x is allowed.
HirokiYasuda03
left a comment
There was a problem hiding this comment.
Thank your for changes.
I think we can merge.
|
@HirokiYasuda03 Regarding the future direction, my current thinking is that, in the long term, it would be preferable to remove the dependency on hdf5storage. At the same time, I do not think we should simply drop compatibility with existing files or MATLAB interoperability, because existing bdpy data has been written through hdf5storage. There seem to be at least two possible migration paths. One option is to keep legacy read support for files written by hdf5storage, as in this PR, by maintaining our own reader implementation. Another option is to provide a conversion utility and eventually deprecate the legacy hdf5storage-based format after a transition period. Personally, I slightly prefer the latter direction, because I would rather avoid maintaining multiple dense formats indefinitely. However, if the maintenance cost of legacy read support is not too high, the former may also be a reasonable choice. |
Summary
Fixes #106. Closes #121.
This PR updates bdpy to work with newer NumPy / NiBabel stacks by relaxing the overly strict "bdpy[mri]" dependency pins, replacing removed MRI APIs, and removing "hdf5storage.loadmat" from the relevant load paths.
Changes
Relax dependency constraints:
Update MRI-related code for newer dependency versions:
Replace v7.3 ".mat" load paths with an h5py-based reader:
Add a NumPy-2-only h5py writer for "SparseArray.save":
Address review feedback:
Add regression coverage for:
Scope note
This PR does not replace all uses of "hdf5storage" with h5py.
The h5py replacement is limited to the v7.3 ".mat" load paths discussed in #106 and to the "SparseArray.save" path where an actual NumPy 2.x save-side failure was observed. Rewriting all save paths to h5py would require reimplementing more of the MATLAB-v7.3 metadata/compatibility behavior currently handled by "hdf5storage", so I kept that out of scope. Dense save paths therefore still rely on "hdf5storage.savemat".
Because dense save paths still rely on "hdf5storage.savemat", NumPy 2.x support is limited to Python >= 3.10. Python 3.8/3.9 keep using NumPy < 2.
I left the other currently unconstrained core dependency lower bounds ("scipy", "scikit-learn", "pandas", etc.) unchanged because they do not appear to conflict with the current Python support floor.
Testing
"pytest tests/dataform/test_mat_v73.py tests/dataform/test_sparse.py tests/dataform/test_features.py"
Python 3.11.15 / NumPy 1.23.5 / NiBabel 3.2.0 / nipy 0.6.1 / scipy 1.15.3 / h5py 3.16.0
Python 3.11.15 / NumPy 2.4.6 / NiBabel 5.4.2 / nipy 0.6.1 / scipy 1.17.1 / h5py 3.16.0
"pytest tests/dataform tests/test_mri.py"
Python 3.11.15 / NumPy 1.23.5 / NiBabel 3.2.0 / nipy 0.6.1 / scipy 1.15.3 / h5py 3.16.0
Python 3.11.15 / NumPy 2.4.6 / NiBabel 5.4.2 / nipy 0.6.1 / scipy 1.17.1 / h5py 3.16.0
Note: the remaining Python 3.11 FastL2LiR CI failure appears to be caused by a PyFastL2LiR NumPy 2.4 compatibility issue, not by this PR. It matches the upstream fix in KamitaniLab/PyFastL2LiR@2eef75d, which has not been released yet.