Fix use-after-free when a custom Metal kernel is called with different dtypes in one graph#3662
Open
discobot wants to merge 1 commit into
Open
Fix use-after-free when a custom Metal kernel is called with different dtypes in one graph#3662discobot wants to merge 1 commit into
discobot wants to merge 1 commit into
Conversation
…t dtypes in one graph The source generated for mx.fast.metal_kernel embeds the input/output dtypes and how each input is passed, but the kernel name only encoded the template arguments. Calling the same kernel with a different input dtype in one lazy graph regenerated different source under the same name, and the custom kernel cache then cleared the old library while its pipeline state was still referenced by the uncommitted command buffer, causing a use-after-free (an abort under Metal API validation, sporadic NaNs otherwise). Append the input/output dtypes and the input address space and reference markers to the generated kernel name so a given name always maps to exactly one source. Adds a regression test composing float16 and float32 calls of one kernel in a single graph. Fixes ml-explore#3347.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Fixes #3347.
Calling the same
mx.fast.metal_kernelwith different input dtypes inside one lazy graph regenerates different source under the same kernel name, and the cache eviction (clear_library) releases a pipeline state that an uncommitted command buffer still references (mechanism detailed in #3347). This implements the approach @zcbenz suggested on #3434: put the input dtypes in the generated kernel name, so a given name always maps to exactly one source and dtype variants coexist in the cache instead of evicting each other.Dtypes turn out not to be the only per-call fact
write_signaturebakes into the source: the address space choice (constantfor inputs withsize() < 8) and pass-by-reference for 0-dim inputs vary per call too. The generated name now encodes all of it — each input's dtype with ac/ssuffix for those two cases, plus the output dtypes — andmax_constant_array_sizeis hoisted to file scope so the naming andwrite_signaturecan't drift. Theclear_librarypath is kept for genuinely different user sources reusing one name (covered by the existingtest_custom_kernel_caching).Added a regression test (
test_custom_kernel_mixed_dtypes) that composes float16 and float32 invocations of one kernel in a single graph; withMETAL_DEVICE_WRAPPER_TYPE=1as CI sets, the unfixed code aborts on it with thecommand buffer references deallocated objectassertion. Also updated the generated-name example in the custom Metal kernels docs (names now look likecustom_kernel_myexp_float_float16_float16).With the fix,
test_fast.py(24 tests) andtest_export_import.py(16 tests) pass under Metal API validation.Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes