Fix int32 overflow in CSR allocation upper bounds#97
Merged
Conversation
Several allocation sites cap a sparse CSR/CSC size with MIN(loose_nnz_bound, rows*cols), or seed a growable vector with rows*cols. The dense cell-count product overflows int for large dimensions (e.g. n_vars*n_vars with n_vars=250000 = 6.25e10 wraps to a negative int32). When the product wraps negative, MIN selects it and new_CSR_matrix does a calloc overflow -> SIGSEGV. (A different size could wrap to a small positive value -> silent under-allocation, which is worse.) In practice this crashes get_problem_data on e.g. a QP with >46341 (=sqrt INT_MAX) variables and the OptimalAdvertising benchmark (transpose/reductions of a 250000-variable matrix). Add a saturating sat_mul_int(a, b) helper (clamps to INT_MAX on overflow) and use MIN(loose_bound, sat_mul_int(rows, cols)) at the affected MIN sites; seed the two iVec growable vectors from the source nnz instead of J->n*m. The allocation is unchanged whenever the product fits, and falls back to the loose bound -- always a valid upper bound at these sites -- when the product would overflow. Sites: problem.c (Lagrange hessian), hstack.c (wsum hess), bivariate matmul (chain-rule jacobian), CSR_sum.c (sum_4_csr_alloc), sparse_matrix.c (index_alloc, sum_row_partition), stacked_pd.c (sum_row_partition), and the iVec seeds in linalg_sparse_matmuls.c and linalg_dense_sparse_matmuls.c. Adds tests/utils/test_alloc_overflow.h: a sat_mul_int unit test and a sparse_index_alloc regression that segfaults before this fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
I encountered this while stress-testing the diffengine backend. Two quite large examples would crash at different locations related to CSR memory allocation. This is quite a nasty bug, but I'm glad we could find it. Let me know if the changes make sense. Maybe you would want to rename the function to something else that's more descriptive though. |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
|
Very nice bug catch! And simple and neat solution. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Several allocation sites cap a sparse CSR/CSC size with MIN(loose_nnz_bound, rowscols), or seed a growable vector with rowscols. The dense cell-count product overflows int for large dimensions (e.g. n_vars*n_vars with n_vars=250000 = 6.25e10 wraps to a negative int32). When the product wraps negative, MIN selects it and new_CSR_matrix does a calloc overflow -> SIGSEGV. (A different size could wrap to a small positive value -> silent under-allocation, which is worse.) In practice this crashes get_problem_data on e.g. a QP with >46341 (=sqrt INT_MAX) variables and the OptimalAdvertising benchmark (transpose/reductions of a 250000-variable matrix).
Add a saturating sat_mul_int(a, b) helper (clamps to INT_MAX on overflow) and use MIN(loose_bound, sat_mul_int(rows, cols)) at the affected MIN sites; seed the two iVec growable vectors from the source nnz instead of J->n*m. The allocation is unchanged whenever the product fits, and falls back to the loose bound -- always a valid upper bound at these sites -- when the product would overflow.
Sites: problem.c (Lagrange hessian), hstack.c (wsum hess), bivariate matmul (chain-rule jacobian), CSR_sum.c (sum_4_csr_alloc), sparse_matrix.c (index_alloc, sum_row_partition), stacked_pd.c (sum_row_partition), and the iVec seeds in linalg_sparse_matmuls.c and linalg_dense_sparse_matmuls.c.
Adds tests/utils/test_alloc_overflow.h: a sat_mul_int unit test and a sparse_index_alloc regression that segfaults before this fix.