ML-DSA SHAKE x4 verify#6
Open
mdcornu wants to merge 5 commits into
Open
Conversation
Changes: - Adds new SHAKE x4 API to perform 4 SHAKE operations in parallel when AVX512VL is supported. - Adds AVX512VL Keccak x4 assembly module (keccak1600x4-avx512vl). - Adds internal SHA3 x4 APIs/context in sha3.h and wrappers in sha3_x4.c modules. - Adds runtime dispatch for ML-DSA sample operations with an OSSL_ML_DSA_SAMPLE_OPS vtable. Callers obtain the correct implementation via ossl_ml_dsa_sample_ops(), which returns either the generic scalar ops functions, or the AVX512VL multi-buffer ops depending on the build and CPU capabilities. - Adds x86-64 multi-buffer function implementation into ml_dsa_sample_hw_x86_64.inc, included in ml_dsa_sample.c when KECCAK1600_ASM and x86_64 are defined. Co-authored-by: Tomasz Kantecki <tomasz.kantecki@intel.com> Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
Add a new `sha3_x4_internal_test` target and recipe to validate the internal SHAKE x4 implementation against scalar SHA3 reference paths. Cover SHAKE-128 and SHAKE-256 in one-shot and incremental modes, plus multi-absorb and multi-squeeze cases across varied input and output sizes. Tests are skipped when AVX512VL extensions are not available. Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
093dca0 to
9791bc1
Compare
Add a new CI workflow that runs AVX512VL specific tests under Intel SDE v10.8, since GitHub Actions runners do not currently have AVX512 hardware. SDE emulates AVX512 instructions and spoofs CPUID so the AVX512 code paths can be exercised. Two jobs are included: linux (ubuntu-latest) and windows (windows-2022). Each job builds OpenSSL with no-shared and enable-fips, then runs the following tests under `sde64 -skx` (Skylake-X, AVX512F+BW+DQ+VL): - ml_dsa_internal_test: exercises AVX512VL ML-DSA sampling - sha3_x4_internal_test: exercises AVX512VL SHAKE x4 functions - openssl fipsinstall: runs the full FIPS KAT suite (including ML-DSA and SHA3 self-tests) against the FIPS provider under emulation Signed-off-by: Marcel Cornu <marcel.d.cornu@intel.com>
mdcornu
pushed a commit
that referenced
this pull request
Jun 10, 2026
TSAN seems to be having a problem with atomic_load_ptr and
atomic_store_ptr. Both are, by default, __ATOMIC_RELAXED operations.
According to the tsan docs, it flags these operations as a race because,
while they are indivisible, they create no happens-before constraint,
meaning they can be reordered.
An exemplar race that is reported is:
WARNING: ThreadSanitizer: data race (pid=2139404)
Read of size 4 at 0x723400002308 by thread T39:
#0 EVP_MD_up_ref crypto/evp/digest.c:995 (threadstest+0x45032d) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#1 evp_md_up_ref crypto/evp/digest.c:974 (threadstest+0x450242) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#2 ossl_method_up_ref crypto/property/property.c:201 (threadstest+0x4b7a55) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#3 ossl_method_store_cache_get_locked crypto/property/property.c:941 (threadstest+0x4b9922) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#4 ossl_method_store_cache_get crypto/property/property.c:974 (threadstest+0x4b9a47) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#5 inner_evp_generic_fetch crypto/evp/evp_fetch.c:314 (threadstest+0x458186) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#6 evp_generic_fetch crypto/evp/evp_fetch.c:404 (threadstest+0x4586dc) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#7 EVP_MD_fetch crypto/evp/digest.c:985 (threadstest+0x4502d7) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#8 derive_kdk crypto/rsa/rsa_ossl.c:472 (threadstest+0x4cf738) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#9 rsa_ossl_private_decrypt crypto/rsa/rsa_ossl.c:646 (threadstest+0x4d0174) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#10 RSA_private_decrypt crypto/rsa/rsa_crpt.c:48 (threadstest+0x4c6971) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#11 rsa_decrypt providers/implementations/asymciphers/rsa_enc.c:321 (threadstest+0x51cab7) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#12 EVP_PKEY_decrypt crypto/evp/asymcipher.c:280 (threadstest+0x44a9ca) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#13 thread_shared_evp_pkey test/threadstest.c:966 (threadstest+0x404be7) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#14 thread_run test/threadstest.h:67 (threadstest+0x40132d) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
Previous write of size 8 at 0x723400002308 by main thread (mutexes: write M0):
#0 memset <null> (libtsan.so.2+0x4c1eb) (BuildId: 40906101a3a1e1f1ececafafda314aee009d688a)
#1 CRYPTO_zalloc crypto/mem.c:228 (threadstest+0x48679d) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#2 evp_md_new crypto/evp/digest.c:758 (threadstest+0x44f35e) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#3 evp_md_from_algorithm crypto/evp/digest.c:839 (threadstest+0x44f885) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#4 construct_evp_method crypto/evp/evp_fetch.c:230 (threadstest+0x457ec9) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#5 ossl_method_construct_this crypto/core_fetch.c:110 (threadstest+0x4801bf) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#6 algorithm_do_map crypto/core_algorithm.c:77 (threadstest+0x47f7a3) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#7 algorithm_do_this crypto/core_algorithm.c:122 (threadstest+0x47f987) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
#8 ossl_provider_doall_activated crypto/provider_core.c:1609 (threadstest+0x49a42a) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#9 ossl_algorithm_do_all crypto/core_algorithm.c:164 (threadstest+0x47fb14) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#10 ossl_method_construct crypto/core_fetch.c:157 (threadstest+0x4803d0) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#11 inner_evp_generic_fetch crypto/evp/evp_fetch.c:333 (threadstest+0x4583a2) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#12 evp_generic_fetch crypto/evp/evp_fetch.c:404 (threadstest+0x4586dc) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#13 EVP_MD_fetch crypto/evp/digest.c:985 (threadstest+0x4502d7) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#14 derive_kdk crypto/rsa/rsa_ossl.c:472 (threadstest+0x4cf738) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#15 rsa_ossl_private_decrypt crypto/rsa/rsa_ossl.c:646 (threadstest+0x4d0174) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#16 RSA_private_decrypt crypto/rsa/rsa_crpt.c:48 (threadstest+0x4c6971) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#17 rsa_decrypt providers/implementations/asymciphers/rsa_enc.c:321 (threadstest+0x51cab7) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
openssl#18 EVP_PKEY_decrypt crypto/evp/asymcipher.c:280 (threadstest+0x44a9ca) (BuildId: f34377d95e3c1d13ab9aa3204d2f1f7840d1c84a)
What tsan is saying here is that the memset in evp_md_new may get
re-ordered such that the contents of the EVP_MD may still be getting
zeroed at the time we have (a) found the EVP_MD in the method store
cache, and (b) attempted to do an up_ref on it.
This is plainly impossible, especially given that, in order to reach the
method store cache, it must be places in the method store algorithm
sparse array, which still requires the taking of the method store write
lock. But for some reason tsan fails to see the memory fence that
creates.
It seems the simplest solution to correct this is, if we are running
under tsan, use __ATOMIC_ACQUIRE and __ATOMIC_RELEASE on
CRYPTO_atomic_[load|store]_ptr to make sure tsan sees the proper memory
ordering.
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Bob Beck <beck@openssl.org>
Reviewed-by: Nikola Pajkovsky <nikolap@openssl.org>
MergeDate: Tue Jun 9 18:17:19 2026
(Merged from openssl#31018)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist