Skip to content

Confine SIMD code to runtime-dispatched tiers (fixes #628)#630

Open
andrewkern wants to merge 6 commits into
MesserLab:masterfrom
andrewkern:fix/simd-runtime-dispatch
Open

Confine SIMD code to runtime-dispatched tiers (fixes #628)#630
andrewkern wants to merge 6 commits into
MesserLab:masterfrom
andrewkern:fix/simd-runtime-dispatch

Conversation

@andrewkern
Copy link
Copy Markdown
Collaborator

Problem. SLiM was built with -mavx2 -mfma applied project-wide, so the compiler emitted AVX2/FMA throughout the whole binary — not just the SIMD kernels. Pre-Haswell x86_64 CPUs (no AVX2) crashed with SIGILL (see issue #628).

Fix. The kernels formerly in eidos_simd.h move to eidos_simd_impl.h, a tier-parameterized body compiled once per instruction-set tier — scalar, SSE4.2, AVX2+FMA (x86_64), NEON (ARM64).

Only the per-tier .cpp files get ISA flags; everything else, including the dispatcher, builds at the baseline x86_64 ABI. Eidos_SIMD_Init() checks the CPU with __builtin_cpu_supports() and points the Eidos_SIMD function pointers at the fastest supported tier.

One binary should now correct on any x86_64 CPU — baseline contains no AVX2/SSE4.2, tier code runs only after the CPU is confirmed to support it. USE_SIMD is now ON/OFF (OFF and MSVC build scalar only).

Verification.

  • -testEidos (7464) and -testSLiM (36853) pass.
  • Object-file inspection: AVX2/FMA only in eidos_simd_avx2.cpp.o, SSE4.2-only in eidos_simd_sse42.cpp.o, every other .o baseline-clean.
  • All three x86 tiers exercised at runtime (AVX2 / SSE4.2 / scalar) — all pass.
  • USE_SIMD=OFF build: zero AVX2/FMA, tests pass.

Things still needed!

  • Xcode project. SLiM.xcodeproj needs updating before the macOS build will work:
  • Add the 6 new files — eidos_simd.cpp, eidos_simd_scalar.cpp, eidos_simd_sse42.cpp, eidos_simd_avx2.cpp, eidos_simd_neon.cpp, and the header eidos_simd_impl.h — and add the 5 .cpp files to the Compile Sources phase of each Eidos/SLiM target.
  • Set per-file flags in the Compile Sources build phase: -mavx2 -mfma on eidos_simd_avx2.cpp, -msse4.2 on eidos_simd_sse42.cpp. Without the flags that target I think it will fail to compile (AVX2 intrinsics with no -mavx2)
  • The qmake build is scalar-only (EIDOS_SUPPRESS_SIMD_DISPATCH) i think?

SLiM was built with -mavx2 -mfma applied to the whole project, which let
the compiler emit AVX2/FMA instructions throughout the entire binary, not
only in the explicit SIMD kernels. The resulting build crashed with SIGILL
on x86_64 CPUs without AVX2 (pre-Haswell, ~2012 and earlier).

The kernels formerly in eidos_simd.h are moved to a tier-parameterized body,
eidos_simd_impl.h, compiled once per instruction-set tier: scalar, SSE4.2,
and AVX2+FMA on x86_64, and NEON on ARM64. Only the per-tier translation
units receive instruction-set flags; every other translation unit, including
the dispatcher, is compiled at the baseline x86_64 ABI. At startup
Eidos_SIMD_Init() probes the CPU with __builtin_cpu_supports() and points the
Eidos_SIMD function pointers at the fastest supported tier.

A single binary is therefore correct on any x86_64 CPU: the baseline of the
executable contains no AVX2/SSE4.2 instructions, and tier code runs only
after the CPU has been confirmed to support it.

USE_SIMD is now a simple ON/OFF switch; OFF (and MSVC) builds the scalar tier
only. The qmake build sets EIDOS_SUPPRESS_SIMD_DISPATCH, keeping its prior
scalar-only behavior since it applies no per-file SIMD flags.
These files were created from copies of eidos_simd.h and carried its
original creation date; set it to their actual creation date.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 97.46377% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.63%. Comparing base (b7435e2) to head (ae3ef67).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
eidos/eidos_test_functions_math.cpp 90.38% 10 Missing ⚠️
eidos/eidos_simd.cpp 85.18% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #630      +/-   ##
==========================================
- Coverage   75.67%   75.63%   -0.04%     
==========================================
  Files         114      118       +4     
  Lines       72808    73048     +240     
  Branches    12873    12916      +43     
==========================================
+ Hits        55095    55252     +157     
- Misses      17713    17796      +83     
Files with missing lines Coverage Δ
eidos/eidos_globals.cpp 72.06% <100.00%> (+0.02%) ⬆️
eidos/eidos_simd_avx2.cpp 100.00% <100.00%> (ø)
eidos/eidos_simd_impl.h 100.00% <100.00%> (ø)
eidos/eidos_simd_scalar.cpp 100.00% <100.00%> (ø)
eidos/eidos_simd_sse42.cpp 100.00% <100.00%> (ø)
eidos/eidos_simd.cpp 85.18% <85.18%> (ø)
eidos/eidos_test_functions_math.cpp 97.01% <90.38%> (-0.51%) ⬇️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrewkern
Copy link
Copy Markdown
Collaborator Author

grrr... some of the windows tests are failing... working on it

The SIMD kernels are compiled once per instruction-set tier, but only the
tier the CPU selects at startup ever runs, so on a modern CI machine the
scalar and SSE4.2 kernels were never executed or covered.

Add Eidos_SIMD_SelectTier(), which forces a specific tier, and rewrite
Eidos_SIMD_Init() in terms of it. The SIMD math tests now cycle through
every tier the CPU supports -- running the full battery against scalar,
SSE4.2, and AVX2+FMA -- then restore the best tier. The battery also gains
direct tests for the kernels that previously had no per-tier coverage:
sqrt, abs, the rounding family, the reductions, the convolution helpers,
and the single-precision spatial-interaction kernels.

This exercises the scalar and SSE4.2 code paths that nothing tested before.
…lags

On Windows the WIN32 target blocks run set_source_files_properties() over
every source file to add "-include config.h".  COMPILE_FLAGS is a single
string property, so that overwrote the "-mavx2 -mfma" / "-msse4.2" set
earlier on the tier files, and eidos_simd_avx2.cpp then failed to compile
its AVX2 intrinsics.

Apply the per-tier ISA flags at the end of the file instead, after the
WIN32 blocks, using set_property(... APPEND_STRING ...) so they extend
rather than replace COMPILE_FLAGS.
@andrewkern
Copy link
Copy Markdown
Collaborator Author

okay @bhaller -- this is ready for review

Copy link
Copy Markdown
Contributor

@bhaller bhaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andrewkern! Sorry for the slow review, things got busy. :-O A couple concerns here, some trivial, some bigger. See what you think.

Comment thread eidos/eidos_simd.cpp Outdated
// Eidos
//
// Created by Andrew Kern on 5/21/2026.
// Copyright (c) 2024-2025 Benjamin C. Haller. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix to 2026 only, please

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and likewise in other files; I notice this is a repeating problem, due to copy-paste...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack sorry about this. will fix that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, a nit. :->

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait just to be clear -- do you want the copyright date switched or my date just to be Created by Andrew Kern in 2026?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Created by" was fine as it was. The copyright was the problem – it can't say that it was copyright in a year before it even existed. :-> So the copyrights on these SIMD files should say "Copyright (c) 2026 Benjamin C. Haller. All rights reserved." (In giving this code to SLiM, you implicitly give the copyright to me, I hope you don't mind that. Somebody needs to hold the copyright, and since I need control over the repo, that ought to be me, since there's no foundation/trust/whatever. :->)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course!

Comment thread eidos/eidos_simd.cpp
// since the AVX2 tier and SLEEF both use FMA instructions.
if (std::strcmp(tier_name, "AVX2+FMA") == 0)
{
if (!(__builtin_cpu_supports("avx2") && __builtin_cpu_supports("fma")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about compilers that don't support __builtin_cpu_supports. I guess this is on GCC 4.8+ and Clang 9+. Clang 9 was only released in 2019, so there might be older versions floating around pretty commonly. GCC 4.8+ is fine, though; that's the first GCC version that supported C++11, so it is required anyway. (Clang supported C++11 in version 3.3, so suddenly requiring Clang 9 is a BIG jump forward in requirements.) Older GCC/Clang, and other compilers like MSVC and Intel, will not have __builtin_cpu_supports, though. So I'm worried that we're substituting a common problem (compilers not supporting __builtin_cpu_supports) in place of a rare problem (hardware so old that this solution is needed in the first place).

I see that Google has a more general solution to this sort of problem: https://github.com/google/cpu_features. It is not simple, and would complexify SLiM's build significantly; it is not a single-header solution, unfortunately, it would require setting up a CMake subdirectory and stuff. Doubtless that would be a PITA to get working. But maybe it is what we have to do? Or we could detect compilers that don't have __builtin_cpu_supports, based on not-GCC, not-Clang, or old-GCC-or-Clang, and fall back to scalar while emitting a warning, or some such. Ugh. Thoughts?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. A few thoughts.

  1. I think the google thing is overkill
  2. MSVC and Intel cc are already handled. EIDOS_SIMD_DISPATCH_X86 is gated on (defined(__GNUC__) || defined(__clang__)), so on MSVC/Intel the entire dispatcher is #if'd out and the scalar path is the only one
  3. Maybe old CLANG versions (or apple's version) would be an issue?

I think this can all be handled with a CMake check like

  include(CheckCXXSourceCompiles)
  check_cxx_source_compiles(
      "int main() { return __builtin_cpu_supports(\"avx2\") ? 1 : 0; }"
      EIDOS_SIMD_HAS_BUILTIN_CPU_SUPPORTS)
  if(NOT EIDOS_SIMD_HAS_BUILTIN_CPU_SUPPORTS)
      add_compile_definitions(EIDOS_SUPPRESS_SIMD_DISPATCH=1)
      message(STATUS "SIMD: compiler does not support __builtin_cpu_supports; falling back to scalar")
  endif()

Want me to add that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. Of course then we have to worry about what version of CMake added CheckCXXSourceCompiles. This sort of thing sometimes feels like an infinite regress. I have never figured out how to find out the first version of CMake that added a given feature; they don't make it easy to tell and even Google seems to have no idea. But I'd say add this, let's not worry about the CMake version. If it causes problems for someone, they can always turn off SIMD altogether for their build, and this check would be inside the SIMD stuff, so that would fix the problem if their CMake is that old. Anyway, updating to a newer CMake version is easier than updating to new hardware, so this is certainly a step forward overall. :-> Thanks!

{
if (!Eidos_SIMD_SelectTier(tier_name))
continue;
std::cout << "Testing SIMD kernels: " << Eidos_SIMD_ActiveTierName() << " tier";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this output is generated whenever the self-tests are run? Normally the self-tests only print the "SUCCESS" message, unless something goes wrong; they're not intended to be verbose in the success case. But maybe it makes sense to do this output, so the user can see what is getting tested. I'm not sure, hmm.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I'll try it and see how it feels, and I can always disable this output later on if I decide against it. But feel free to comment if you have an opinion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could collapse this to a one line summary -- I think we want something here though?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, a one-line summary seems like a good idea...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants