Skip to content

Drop the hdf5storage dependency (write-side breaking change)#141

Draft
KenyaOtsuka wants to merge 3 commits into
KamitaniLab:devfrom
KenyaOtsuka:refactor/drop-hdf5storage-write
Draft

Drop the hdf5storage dependency (write-side breaking change)#141
KenyaOtsuka wants to merge 3 commits into
KamitaniLab:devfrom
KenyaOtsuka:refactor/drop-hdf5storage-write

Conversation

@KenyaOtsuka

Copy link
Copy Markdown

Summary

Removes the hdf5storage dependency entirely. This follows up on #137, which
moved the read path off hdf5storage (onto an h5py reader) but deliberately
left the dense/sparse write paths on hdf5storage.savemat, keeping the
dependency and the NumPy < 2 pin on Python 3.8/3.9.

The guiding principle here:

  • Read compatibility is preserved. Existing files written by
    hdf5storage / MATLAB v7.3 (and by older bdpy) still load.
  • Write compatibility with MATLAB is intentionally dropped. New files are
    written as bdpy-native plain HDF5 and are not meant to be opened by
    MATLAB's load.

This is a deliberate write-side breaking change (see Compatibility below).

Why

hdf5storage broke under NumPy 2.0 (it referenced the removed np.unicode_;
see #106). Rather than reimplement a full MATLAB-v7.3-compatible writer just to
keep the on-disk format, we drop the dependency and write plain HDF5 at the
save sites. bdpy reads its own output back through the existing h5py reader.

Changes

  • bdpy/dataform/_mat_v73.py → read-only legacy reader

    • Remove savemat() / write_dataset() (and from __all__); the module no
      longer writes anything.
    • Rewrite the module docstring to state it provides h5py-based reading
      support for MATLAB v7.3 / hdf5storage .mat files.
    • Improve read_cell(): the plain-matrix branch now routes through
      read_dataset(), so MATLAB-style transposed matrices carrying
      MATLAB_class are de-transposed before being split into rows. Read
      behavior for bdpy's own (plain) index/shape datasets is unchanged.
  • Write plain HDF5 directly at the save sites

    • bdpy/dataform/sparse.py: save_array (dense), save_multiarrays, and
      SparseArray.save write datasets/groups directly with h5py
      (SparseArray.save inlines the former __save_h5py body; append-mode still
      preserves other top-level variables).
    • bdpy/dataform/features.py: save_feature writes the feat dataset
      directly with h5py.
  • Drop the dependency

    • Remove hdf5storage from pyproject.toml, mypy.ini, README.md, and the
      legacy tests/env/*/Pipfiles; clean up a stale comment in datastore.py.

Compatibility

Behavior
Reading old hdf5storage / MATLAB v7.3 / bdpy files ✅ Unchanged
Newly written dense arrays Plain HDF5 dataset
Newly written sparse arrays Plain HDF5 group/datasets
Newly written feature files Plain HDF5 with a feat dataset
Opening newly written files in MATLAB ❌ No longer supported (intentional)

The only compatibility promise is reading existing files. If MATLAB-readable
output is required, pin an older bdpy release.

Tests

  • _mat_v73: removed the writer tests; added read_cell coverage for the
    transposed-matrix path.
  • sparse: dense/sparse round-trips, save_multiarrays, preserve-other-
    variables, and an assertion that dense saves carry no MATLAB_class /
    Python.Shape.
  • features: mock data now written as plain HDF5; added an explicit
    save_feature() test (round-trips via loadmat_key and Features).

_mat_v73 now only reads MATLAB v7.3 / hdf5storage / bdpy .mat files; the
plain-HDF5 savemat()/write_dataset() helpers are removed (writing moves to the
save sites). read_cell() now routes the plain-matrix branch through
read_dataset() so MATLAB-style transposed matrices with MATLAB_class are
de-transposed before being split into rows. Read compatibility is unchanged.
save_array (dense), save_multiarrays, SparseArray.save and save_feature now
write plain HDF5 directly with h5py instead of going through a savemat wrapper.
SparseArray.save inlines the former __save_h5py body. New files are bdpy-native
plain HDF5 and intentionally not MATLAB-load compatible; existing files still
load. Tests cover dense/sparse/multiarray round-trips, save_feature, and that
new datasets carry no MATLAB_class/Python.Shape.
Remove hdf5storage from the project dependencies, mypy config, README and the
legacy test Pipfiles now that nothing imports it, and update a stale
hdf5storage reference in a datastore comment.
@KenyaOtsuka

Copy link
Copy Markdown
Author

This PR is intended to be merged in a release several releases after #142 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant