Confine SIMD code to runtime-dispatched tiers (fixes #628)#630
Confine SIMD code to runtime-dispatched tiers (fixes #628)#630andrewkern wants to merge 6 commits into
Conversation
SLiM was built with -mavx2 -mfma applied to the whole project, which let the compiler emit AVX2/FMA instructions throughout the entire binary, not only in the explicit SIMD kernels. The resulting build crashed with SIGILL on x86_64 CPUs without AVX2 (pre-Haswell, ~2012 and earlier). The kernels formerly in eidos_simd.h are moved to a tier-parameterized body, eidos_simd_impl.h, compiled once per instruction-set tier: scalar, SSE4.2, and AVX2+FMA on x86_64, and NEON on ARM64. Only the per-tier translation units receive instruction-set flags; every other translation unit, including the dispatcher, is compiled at the baseline x86_64 ABI. At startup Eidos_SIMD_Init() probes the CPU with __builtin_cpu_supports() and points the Eidos_SIMD function pointers at the fastest supported tier. A single binary is therefore correct on any x86_64 CPU: the baseline of the executable contains no AVX2/SSE4.2 instructions, and tier code runs only after the CPU has been confirmed to support it. USE_SIMD is now a simple ON/OFF switch; OFF (and MSVC) builds the scalar tier only. The qmake build sets EIDOS_SUPPRESS_SIMD_DISPATCH, keeping its prior scalar-only behavior since it applies no per-file SIMD flags.
These files were created from copies of eidos_simd.h and carried its original creation date; set it to their actual creation date.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #630 +/- ##
==========================================
- Coverage 75.67% 75.63% -0.04%
==========================================
Files 114 118 +4
Lines 72808 73048 +240
Branches 12873 12916 +43
==========================================
+ Hits 55095 55252 +157
- Misses 17713 17796 +83
🚀 New features to boost your workflow:
|
|
grrr... some of the windows tests are failing... working on it |
The SIMD kernels are compiled once per instruction-set tier, but only the tier the CPU selects at startup ever runs, so on a modern CI machine the scalar and SSE4.2 kernels were never executed or covered. Add Eidos_SIMD_SelectTier(), which forces a specific tier, and rewrite Eidos_SIMD_Init() in terms of it. The SIMD math tests now cycle through every tier the CPU supports -- running the full battery against scalar, SSE4.2, and AVX2+FMA -- then restore the best tier. The battery also gains direct tests for the kernels that previously had no per-tier coverage: sqrt, abs, the rounding family, the reductions, the convolution helpers, and the single-precision spatial-interaction kernels. This exercises the scalar and SSE4.2 code paths that nothing tested before.
…lags On Windows the WIN32 target blocks run set_source_files_properties() over every source file to add "-include config.h". COMPILE_FLAGS is a single string property, so that overwrote the "-mavx2 -mfma" / "-msse4.2" set earlier on the tier files, and eidos_simd_avx2.cpp then failed to compile its AVX2 intrinsics. Apply the per-tier ISA flags at the end of the file instead, after the WIN32 blocks, using set_property(... APPEND_STRING ...) so they extend rather than replace COMPILE_FLAGS.
|
okay @bhaller -- this is ready for review |
bhaller
left a comment
There was a problem hiding this comment.
Hi @andrewkern! Sorry for the slow review, things got busy. :-O A couple concerns here, some trivial, some bigger. See what you think.
| // Eidos | ||
| // | ||
| // Created by Andrew Kern on 5/21/2026. | ||
| // Copyright (c) 2024-2025 Benjamin C. Haller. All rights reserved. |
There was a problem hiding this comment.
and likewise in other files; I notice this is a repeating problem, due to copy-paste...
There was a problem hiding this comment.
ack sorry about this. will fix that.
There was a problem hiding this comment.
wait just to be clear -- do you want the copyright date switched or my date just to be Created by Andrew Kern in 2026?
There was a problem hiding this comment.
The "Created by" was fine as it was. The copyright was the problem – it can't say that it was copyright in a year before it even existed. :-> So the copyrights on these SIMD files should say "Copyright (c) 2026 Benjamin C. Haller. All rights reserved." (In giving this code to SLiM, you implicitly give the copyright to me, I hope you don't mind that. Somebody needs to hold the copyright, and since I need control over the repo, that ought to be me, since there's no foundation/trust/whatever. :->)
| // since the AVX2 tier and SLEEF both use FMA instructions. | ||
| if (std::strcmp(tier_name, "AVX2+FMA") == 0) | ||
| { | ||
| if (!(__builtin_cpu_supports("avx2") && __builtin_cpu_supports("fma"))) |
There was a problem hiding this comment.
I'm worried about compilers that don't support __builtin_cpu_supports. I guess this is on GCC 4.8+ and Clang 9+. Clang 9 was only released in 2019, so there might be older versions floating around pretty commonly. GCC 4.8+ is fine, though; that's the first GCC version that supported C++11, so it is required anyway. (Clang supported C++11 in version 3.3, so suddenly requiring Clang 9 is a BIG jump forward in requirements.) Older GCC/Clang, and other compilers like MSVC and Intel, will not have __builtin_cpu_supports, though. So I'm worried that we're substituting a common problem (compilers not supporting __builtin_cpu_supports) in place of a rare problem (hardware so old that this solution is needed in the first place).
I see that Google has a more general solution to this sort of problem: https://github.com/google/cpu_features. It is not simple, and would complexify SLiM's build significantly; it is not a single-header solution, unfortunately, it would require setting up a CMake subdirectory and stuff. Doubtless that would be a PITA to get working. But maybe it is what we have to do? Or we could detect compilers that don't have __builtin_cpu_supports, based on not-GCC, not-Clang, or old-GCC-or-Clang, and fall back to scalar while emitting a warning, or some such. Ugh. Thoughts?
There was a problem hiding this comment.
Hmmm. A few thoughts.
- I think the google thing is overkill
- MSVC and Intel cc are already handled. EIDOS_SIMD_DISPATCH_X86 is gated on
(defined(__GNUC__) || defined(__clang__)), so on MSVC/Intel the entire dispatcher is#if'd out and the scalar path is the only one - Maybe old CLANG versions (or apple's version) would be an issue?
I think this can all be handled with a CMake check like
include(CheckCXXSourceCompiles)
check_cxx_source_compiles(
"int main() { return __builtin_cpu_supports(\"avx2\") ? 1 : 0; }"
EIDOS_SIMD_HAS_BUILTIN_CPU_SUPPORTS)
if(NOT EIDOS_SIMD_HAS_BUILTIN_CPU_SUPPORTS)
add_compile_definitions(EIDOS_SUPPRESS_SIMD_DISPATCH=1)
message(STATUS "SIMD: compiler does not support __builtin_cpu_supports; falling back to scalar")
endif()Want me to add that?
There was a problem hiding this comment.
Yes, I think so. Of course then we have to worry about what version of CMake added CheckCXXSourceCompiles. This sort of thing sometimes feels like an infinite regress. I have never figured out how to find out the first version of CMake that added a given feature; they don't make it easy to tell and even Google seems to have no idea. But I'd say add this, let's not worry about the CMake version. If it causes problems for someone, they can always turn off SIMD altogether for their build, and this check would be inside the SIMD stuff, so that would fix the problem if their CMake is that old. Anyway, updating to a newer CMake version is easier than updating to new hardware, so this is certainly a step forward overall. :-> Thanks!
| { | ||
| if (!Eidos_SIMD_SelectTier(tier_name)) | ||
| continue; | ||
| std::cout << "Testing SIMD kernels: " << Eidos_SIMD_ActiveTierName() << " tier"; |
There was a problem hiding this comment.
It looks like this output is generated whenever the self-tests are run? Normally the self-tests only print the "SUCCESS" message, unless something goes wrong; they're not intended to be verbose in the success case. But maybe it makes sense to do this output, so the user can see what is getting tested. I'm not sure, hmm.
There was a problem hiding this comment.
I suppose I'll try it and see how it feels, and I can always disable this output later on if I decide against it. But feel free to comment if you have an opinion.
There was a problem hiding this comment.
We could collapse this to a one line summary -- I think we want something here though?
There was a problem hiding this comment.
OK, a one-line summary seems like a good idea...?
Problem. SLiM was built with -mavx2 -mfma applied project-wide, so the compiler emitted AVX2/FMA throughout the whole binary — not just the SIMD kernels. Pre-Haswell x86_64 CPUs (no AVX2) crashed with SIGILL (see issue #628).
Fix. The kernels formerly in
eidos_simd.hmove toeidos_simd_impl.h, a tier-parameterized body compiled once per instruction-set tier — scalar, SSE4.2, AVX2+FMA (x86_64), NEON (ARM64).Only the per-tier
.cppfiles get ISA flags; everything else, including the dispatcher, builds at the baseline x86_64 ABI.Eidos_SIMD_Init()checks the CPU with__builtin_cpu_supports()and points the Eidos_SIMD function pointers at the fastest supported tier.One binary should now correct on any x86_64 CPU — baseline contains no AVX2/SSE4.2, tier code runs only after the CPU is confirmed to support it. USE_SIMD is now ON/OFF (OFF and MSVC build scalar only).
Verification.
Things still needed!