Skip to content

Fix gather_qmm NAX kernel name mismatch#3632

Open
scyyh11 wants to merge 1 commit into
ml-explore:mainfrom
scyyh11:fix-gather-qmm-nax-kernel-name
Open

Fix gather_qmm NAX kernel name mismatch#3632
scyyh11 wants to merge 1 commit into
ml-explore:mainfrom
scyyh11:fix-gather-qmm-nax-kernel-name

Conversation

@scyyh11

@scyyh11 scyyh11 commented Jun 5, 2026

Copy link
Copy Markdown

Proposed changes

gather_qmm_nax builds the NAX kernel name with bk = 32, but the gather qmm NAX kernels are only instantiated with _bk64_ names. On NAX devices this makes the matrix-size gather_qmm path (taken for half-precision inputs, or float32 with tf32 enabled) fail during kernel lookup with an Unable to load kernel ..._bk32... error.

This changes the gather_qmm_nax kernel-name construction to use bk = 64, matching the existing instantiations. In this path bk is only part of the kernel name; the dispatch geometry is unchanged.

The PR also adds a regression test for matrix-size half-precision gather_qmm covering both affine and mxfp4, checked against a float32 reference.

Tested with:

DEVICE=gpu .venv/bin/pytest -q python/tests/test_quantized.py::TestQuantized::test_gather_qmm_matrix_path

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

gather_qmm_nax builds kernel names with bk = 32, but the gather qmm NAX
kernels are only instantiated with bk = 64 (bk only enters the name), so
the lookup always failed with "Unable to load kernel ..._bk32...".

Use bk = 64 to match the instantiations and add a regression test that
runs matrix-size gather_qmm in half precision against a float32
reference.

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>

@zcbenz zcbenz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants