Skip to content

Fix int32 overflow in CSR allocation upper bounds#97

Merged
dance858 merged 3 commits into
mainfrom
fix-int-overflow-csr-alloc
Jun 3, 2026
Merged

Fix int32 overflow in CSR allocation upper bounds#97
dance858 merged 3 commits into
mainfrom
fix-int-overflow-csr-alloc

Conversation

@Transurgeon
Copy link
Copy Markdown
Collaborator

Several allocation sites cap a sparse CSR/CSC size with MIN(loose_nnz_bound, rowscols), or seed a growable vector with rowscols. The dense cell-count product overflows int for large dimensions (e.g. n_vars*n_vars with n_vars=250000 = 6.25e10 wraps to a negative int32). When the product wraps negative, MIN selects it and new_CSR_matrix does a calloc overflow -> SIGSEGV. (A different size could wrap to a small positive value -> silent under-allocation, which is worse.) In practice this crashes get_problem_data on e.g. a QP with >46341 (=sqrt INT_MAX) variables and the OptimalAdvertising benchmark (transpose/reductions of a 250000-variable matrix).

Add a saturating sat_mul_int(a, b) helper (clamps to INT_MAX on overflow) and use MIN(loose_bound, sat_mul_int(rows, cols)) at the affected MIN sites; seed the two iVec growable vectors from the source nnz instead of J->n*m. The allocation is unchanged whenever the product fits, and falls back to the loose bound -- always a valid upper bound at these sites -- when the product would overflow.

Sites: problem.c (Lagrange hessian), hstack.c (wsum hess), bivariate matmul (chain-rule jacobian), CSR_sum.c (sum_4_csr_alloc), sparse_matrix.c (index_alloc, sum_row_partition), stacked_pd.c (sum_row_partition), and the iVec seeds in linalg_sparse_matmuls.c and linalg_dense_sparse_matmuls.c.

Adds tests/utils/test_alloc_overflow.h: a sat_mul_int unit test and a sparse_index_alloc regression that segfaults before this fix.

Several allocation sites cap a sparse CSR/CSC size with MIN(loose_nnz_bound,
rows*cols), or seed a growable vector with rows*cols. The dense cell-count product
overflows int for large dimensions (e.g. n_vars*n_vars with n_vars=250000 = 6.25e10
wraps to a negative int32). When the product wraps negative, MIN selects it and
new_CSR_matrix does a calloc overflow -> SIGSEGV. (A different size could wrap to a
small positive value -> silent under-allocation, which is worse.) In practice this
crashes get_problem_data on e.g. a QP with >46341 (=sqrt INT_MAX) variables and the
OptimalAdvertising benchmark (transpose/reductions of a 250000-variable matrix).

Add a saturating sat_mul_int(a, b) helper (clamps to INT_MAX on overflow) and use
MIN(loose_bound, sat_mul_int(rows, cols)) at the affected MIN sites; seed the two
iVec growable vectors from the source nnz instead of J->n*m. The allocation is
unchanged whenever the product fits, and falls back to the loose bound -- always a
valid upper bound at these sites -- when the product would overflow.

Sites: problem.c (Lagrange hessian), hstack.c (wsum hess), bivariate matmul
(chain-rule jacobian), CSR_sum.c (sum_4_csr_alloc), sparse_matrix.c (index_alloc,
sum_row_partition), stacked_pd.c (sum_row_partition), and the iVec seeds in
linalg_sparse_matmuls.c and linalg_dense_sparse_matmuls.c.

Adds tests/utils/test_alloc_overflow.h: a sat_mul_int unit test and a
sparse_index_alloc regression that segfaults before this fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Transurgeon
Copy link
Copy Markdown
Collaborator Author

I encountered this while stress-testing the diffengine backend. Two quite large examples would crash at different locations related to CSR memory allocation.
Instead of fixing this for the two encounters I decided to revisit the PR: https://github.com/SparseDifferentiation/SparseDiffEngine/pull/92/changes which added these tighter bounds and added a limit for all of them.
This limit essentially is a min between the product and the max integer value to prevent overflows.

This is quite a nasty bug, but I'm glad we could find it. Let me know if the changes make sense. Maybe you would want to rename the function to something else that's more descriptive though.

Transurgeon and others added 2 commits June 3, 2026 13:36
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dance858 dance858 merged commit 6a6dee3 into main Jun 3, 2026
11 checks passed
@dance858
Copy link
Copy Markdown
Collaborator

dance858 commented Jun 3, 2026

Very nice bug catch! And simple and neat solution.

@Transurgeon Transurgeon deleted the fix-int-overflow-csr-alloc branch June 6, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants