FIX: Adding build-patch to disable Scalar FMA on SLEEF #62

SwayamInSync · 2026-01-21T13:30:54Z

closes #56

This PR adds the patch to disable the quad scalar FMA instruction compile and dispatch, this was the only behaviour which was enabled by default and cannot be controlled by SIMD feature detection. I noticed for enabling/disabling AVX the detection already works as expected (so this scalar FMA was the only issue)

In terms of performance on V3+ CPUs, the scalar computation will use the pure-C implementations instead of FMA, this might effect the current numpy_quaddtype codebase performance (i.e. ufunc, other slots) the QBLAS performance won't be hurt that much as it uses the vectorized implementations on supported machines.

So as a future note, it was already in my plans to extend the QBLAS and replace all the quaddtype loops with the vectorized loops from QBLAS

NOTE: On non-x86-64 machines, scalar will dispatch the FMA instruction and there will be no performance hurt

UPDATE: With the new workflow, build detects the FMA support and dispatches the corresponding SIMD build flags, so modern x86_64-V3 SIMD CPUs will no longer get performance hurt. Also introduced a new flag disable_fma at quaddtype build for user to enable/disable the FMA build

#59 is required to be merged before this (I validated the working on my fork)

SwayamInSync · 2026-01-21T13:36:37Z

We cannot forward this patch over conda-forge as there we used the conda vendored SLEEF, forcing this patch there will be making all general public to accept it (and this is bad) read the last messages of this conversation here conda-forge/numpy_quaddtype-feedstock#11

So maybe soon as they get the native macos ARM runners, we can then port this fix there. (This thing can be documented)

SwayamInSync · 2026-01-21T14:01:00Z

@mhvk you might want to test this on your old machine?

maybe directly installing from my fork's branch?

https://github.com/SwayamInSync/numpy-quaddtype/tree/simd-patch

mhvk · 2026-01-21T14:16:07Z

Yes, that works! Thanks!!!

subprojects/sleef.wrap

ngoldbaum · 2026-01-21T15:22:41Z

subprojects/packagefiles/sleef/fix-purecfma-scalar-x86.patch

+ if(SLEEF_TARGET_PROCESSOR MATCHES "(x86|AMD64|amd64|^i.86$)")
+   set(SLEEF_ARCH_X86 ON CACHE INTERNAL "True for x86 architecture.")
+
+-  set(CLANG_FLAGS_ENABLE_PURECFMA_SCALAR "-mavx2;-mfma")


Unfortunately my cmake is pretty terrible: is there a way to set these flags for x86_64-v3 or newer?

I was reading and it seems querying CPUIDs is a way to conditionally set the flags, also not just flags but the entire dispatch mechanism.
I'll try if I somehow managed to get this right, as this unconditional SIMD dispatching is very tightly integrated with SLEEF's FMA code generation

OK, thanks! It'd be nice to enable better optimizations when people build for themselves but not critical.

SwayamInSync · 2026-01-21T15:26:22Z

Also should we merge the #59 ? to get that CI working here for this PR?

ngoldbaum · 2026-01-21T15:31:21Z

Also should we merge the #59 ?

Probably this first, then that PR, otherwise the tests will be failing temporarily.

SwayamInSync · 2026-01-21T15:45:35Z

Unfortunately my cmake is pretty terrible: is there a way to set these flags for x86_64-v3 or newer?

Maybe one more, I am not sure if possible but if Meson allows conditional patch-applying then we can detect by ourselves about x86_64-v2 CPUs and apply the patch else don't :)
This woud've been pretty simple, if meson allows (I didn't find this info online)

SwayamInSync · 2026-01-21T16:36:41Z

@ngoldbaum I tested this thing on my fork and it worked, before I make the updates here to overwhelm, here is the workflow if it makes sense then will push the changes here.

For native systems where FMA is not supported, Inside the SLEEF's meson.build, it compile a C code snippet and if failed then it sets a new custom flag SLEEF_DISABLE_PURECFMA_SCALAR=ON (which is not present in SLEEF but we add it via our patch) and then accordingly the flags and FMA code dispatch gets disabled. If that C code snippet compiles then this flag is set to OFF, patch is applied but no changes were made
For CI (emulations/cross-compilation) in these cases machines usually support FMA but we emulate the device, so for these cases a user can install the numpy_quaddtype as pip install . --no-build-isolation -v -Csetup-args=-Ddisable_fma=true this new option will also follow the same as in above mentioned

SwayamInSync · 2026-01-21T16:37:24Z

Let me know if it sounds good then I'll cherry-pick the commits here

ngoldbaum · 2026-01-21T16:39:07Z

Sounds reasonable to me. The new build argument will need a mention in the readme, probably with a link to the SLEEF issue you opened and a note that it's a workaround for a SLEEF issue.

SwayamInSync · 2026-01-21T18:20:31Z

tests/test_quaddtype.py

+    assert result == True
+
+
+def test_sleef_purecfma_symbols():


It's a test that justs prints the available purecfma functions inside the quaddtype build (if supports) otherwise nothing

Which seems to work as from SandyBridge processor logs (fma is disabled) the output has nothing

tests/test_quaddtype.py::test_sleef_purecfma_symbols PASSED

but on haswell (fma enabled) output shows some

tests/test_quaddtype.py::test_logical_reduce_on_non_quad_arrays PASSED tests/test_quaddtype.py::test_sleef_purecfma_symbols ✓ Found 67 PURECFMA symbols (FMA optimizations enabled) Sample symbols: 000000000014cd90 t sleef_acoshq1_u10purecfma 000000000013eab0 t sleef_acosq1_u10purecfma 0000000000129540 t sleef_addq1_u05purecfma 000000000014a150 t sleef_asinhq1_u10purecfma 000000000013d3b0 t sleef_asinq1_u10purecfma ... and 62 more PASSED

SwayamInSync · 2026-01-21T18:23:36Z

@mhvk you might want to test this on your old machine?

maybe directly installing from my fork's branch?
https://github.com/SwayamInSync/numpy-quaddtype/tree/simd-patch

@mhvk just for double checking, can you give another shot?

ngoldbaum · 2026-01-21T18:34:28Z

The new build argument will need a mention in the readme, probably with a link to the SLEEF issue you opened and a note that it's a workaround for a SLEEF issue.

Can you update the readme and/or docs too?

SwayamInSync · 2026-01-21T18:35:31Z

Can you update the readme and/or docs too?

Yup on it

SwayamInSync · 2026-01-21T18:40:40Z

The docs include the sections of README so they'll automatically get updated

ngoldbaum

Awesome! I didn't spot any issues in the new meson configuration.

SwayamInSync · 2026-01-21T19:15:49Z

Thanks, I updated the PR's description and will merge once @mhvk gives a thumbs up by installing from this new workflow

mhvk · 2026-01-21T19:17:54Z

@SwayamInSync - unfortunately, the branch no longer works! Just to be sure I didn't make a mistake before, I checked out the second commit (efba625) and that does work, so something must have gone wrong in the third commit (f574019)...

The both statements are true for creating a fresh virtual environment, git clean -fxd on the repository, and then running reinstall.sh.

I also tried what is suggested in the readme now, but that errors:

pip install . -Csetup-args=-Ddisable_fma=true
Processing /data/mhvk/packages/numpy-quaddtype
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [42 lines of output]
      + meson setup /data/mhvk/packages/numpy-quaddtype /data/mhvk/packages/numpy-quaddtype/.mesonpy-3pzucj5o -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md -Ddisable_fma=true --native-file=/data/mhvk/packages/numpy-quaddtype/.mesonpy-3pzucj5o/meson-python-native-file.ini
      The Meson build system
      Version: 1.10.1
      Source dir: /data/mhvk/packages/numpy-quaddtype
      Build dir: /data/mhvk/packages/numpy-quaddtype/.mesonpy-3pzucj5o
      Build type: native build
      WARNING: Project does not target a minimum version but uses feature introduced in '1.1': meson.options file. Use meson_options.txt instead
      Project name: numpy_quaddtype
      Project version: undefined
      C compiler for the host machine: /usr/bin/ccache cc (gcc 15.2.0 "cc (Debian 15.2.0-12) 15.2.0")
      C linker for the host machine: cc ld.bfd 2.45.50.20251209
      C++ compiler for the host machine: /usr/bin/ccache c++ (gcc 15.2.0 "c++ (Debian 15.2.0-12) 15.2.0")
      C++ linker for the host machine: c++ ld.bfd 2.45.50.20251209
      Host machine cpu family: x86_64
      Host machine cpu: x86_64
      Program python found: YES (/data/mhvk/packages/numpy-quaddtype/temp/bin/python3)
      Did not find pkg-config by name 'pkg-config'
      Found pkg-config: NO
      Run-time dependency python found: YES 3.13
      Did not find pkg-config by name 'pkg-config'
      Found pkg-config: NO
      Found CMake: /tmp/user/2500/pip-build-env-6sbgqy4r/overlay/bin/cmake (4.2.1)
      Run-time dependency qblas found: NO (tried pkgconfig and cmake)
      Looking for a fallback subproject for the dependency qblas
      
      Executing subproject qblas
      
      qblas| Project name: qblas
      qblas| Project version: 1.0.0
      qblas| Build targets in project: 0
      qblas| Subproject qblas finished.
      
      Dependency qblas from subproject subprojects/qblas found: YES 1.0.0
      Run-time dependency sleef found: NO (tried pkgconfig and cmake)
      Message: SLEEF FMA disable option: true
      
      Executing subproject sleef
      
      
      ../subprojects/sleef/meson.build:1:0: ERROR: Unknown option: "sleef:disable_fma".
      
      A full log can be found at /data/mhvk/packages/numpy-quaddtype/.mesonpy-3pzucj5o/meson-logs/meson-log.txt
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> from file:///data/mhvk/packages/numpy-quaddtype

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

In case it helps, inside subprojects/sleef,

git status
HEAD detached at 3.9.0
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   Configure.cmake
        modified:   src/quad/CMakeLists.txt
        modified:   src/quad/qdispscalar.c.org

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        .meson-subproject-wrap-hash.txt
        fix-purecfma-scalar-x86.patch
        meson.build

no changes added to commit (use "git add" and/or "git commit -a")

SwayamInSync · 2026-01-21T19:19:54Z

Can you send the installation command you tried?

SwayamInSync · 2026-01-21T19:32:25Z

@mhvk I guess you might be having an old cached subproject folder from previous install, because only then meson skips the new build and tried to use the older directory and fails.

Can you please try the following steps

git clone -b simd-patch https://github.com/SwayamInSync/numpy-quaddtype.git
cd numpy-quaddtype

# all the env steps if needed
bash reinstall.sh

This will remove the old cached files and will do a fresh build

ngoldbaum · 2026-01-21T19:33:45Z

you might need git clean -ffxd to remove the subprojects

ngoldbaum · 2026-01-21T19:34:55Z

https://git-scm.com/docs/git-clean#Documentation/git-clean.txt---force

Git: it has the best UI

mhvk · 2026-01-21T19:42:10Z

I did the above git clone, etc., in a clean directory -- no luck. Doing git clean -ffxd beforehand also did not help. See
https://www.astro.utoronto.ca/~mhvk/build_log_failure.txt for a build after doing that deep clean.

What does help is

git checkout HEAD~2
reinstall.sh

See https://www.astro.utoronto.ca/~mhvk/build_log_success.txt

SwayamInSync · 2026-01-21T20:02:31Z

@mhvk both the logs are showing

Successfully installed numpy_quaddtype-1.0.0

What was the issue here?

mhvk · 2026-01-21T20:08:30Z

Installation indeed works in both cases, but for the PR as it stands I get a failure on import:

python -c "import numpy_quaddtype"
Illegal instruction        (core dumped) python -c "import numpy_quaddtype"

while installing at the second commit allows the import to succeed.

SwayamInSync · 2026-01-21T20:10:49Z

I saw the failure log is saying

  sleef| Checking if "FMA instruction support" compiles: YES
  sleef| Message: FMA supported - enabling PURECFMA scalar code path

Maybe the implicit fma check is not working? can you try passing the flag now?

pip install . -v -Csetup-args=-Ddisable_fma=true 2>&1 | tee build_log.txt

Make sure all the old subprojects are cleaned

SwayamInSync · 2026-01-21T20:25:13Z

I made the implicit FMA check to compile and run the code, since @mhvk is getting the error at runtime, so this should be now catching it correct.

SwayamInSync · 2026-01-21T21:06:06Z

Just checked on CI emulating SandyBridge, the detection is now working correct (although emulating build was pretty slow ~40 mins to build, so we are only be using it at runtime)

sleef| Checking if "FMA instruction runtime support" runs: NO (1)
sleef| Message: FMA not supported at runtime - disabling PURECFMA scalar code path

and this is on haswell

sleef| Checking if "FMA instruction runtime support" runs: YES
sleef| Message: FMA supported - enabling PURECFMA scalar code path

mhvk · 2026-01-21T21:19:58Z

Super! I now checked that on the latest branch, and just a plain pip install . works. Thanks so much!!

SwayamInSync · 2026-01-21T21:22:20Z

Thanks @mhvk for confirming it, your verification helped a lot today 😄

mhvk · 2026-01-21T21:42:04Z

That definitely was a tricky one...

juntyr · 2026-01-22T04:10:48Z

reinstall.sh


 python -m pip uninstall -y numpy_quaddtype
 python -m pip install . -vv 2>&1 | tee build_log.txt
+# pip install . --no-build-isolation -v -Csetup-args=-Ddisable_fma=true 2>&1 | tee build_log.txt


Do we still need this?

yeah will be easy to test the option on x86 machines

disbale fma

a33bb6f

SwayamInSync requested a review from ngoldbaum January 21, 2026 13:42

ngoldbaum reviewed Jan 21, 2026

View reviewed changes

subprojects/sleef.wrap Show resolved Hide resolved

ngoldbaum reviewed Jan 21, 2026

View reviewed changes

SwayamInSync mentioned this pull request Jan 21, 2026

CI: Adding CI to test old CPU for SIMD compatibility via SDE Emulation #59

Merged

SwayamInSync added 2 commits January 21, 2026 18:08

Merge branch 'main' into simd-patch

efba625

updating to new workflow

f574019

SwayamInSync commented Jan 21, 2026

View reviewed changes

updated readme

bb4dfc0

ngoldbaum approved these changes Jan 21, 2026

View reviewed changes

compie and run fma test

01e2006

removed unnecessary comment

1281abf

juntyr reviewed Jan 22, 2026

View reviewed changes

SwayamInSync merged commit b396148 into numpy:main Jan 22, 2026
13 checks passed

SwayamInSync deleted the simd-patch branch January 22, 2026 06:17

SwayamInSync mentioned this pull request Jan 22, 2026

BUILD: Add Pyodide CI and build recipes #66

Open

Uh oh!

FIX: Adding build-patch to disable Scalar FMA on SLEEF #62

FIX: Adding build-patch to disable Scalar FMA on SLEEF #62

Conversation

SwayamInSync commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SwayamInSync commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

mhvk commented Jan 21, 2026

Uh oh!

Uh oh!

ngoldbaum Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

SwayamInSync Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ngoldbaum Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

ngoldbaum commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

ngoldbaum commented Jan 21, 2026

Uh oh!

SwayamInSync Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

SwayamInSync Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

ngoldbaum commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

mhvk commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

ngoldbaum commented Jan 21, 2026

Uh oh!

ngoldbaum commented Jan 21, 2026

Uh oh!

mhvk commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

mhvk commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026

Uh oh!

SwayamInSync commented Jan 21, 2026 •

edited

Loading

SwayamInSync commented Jan 21, 2026 •

edited

Loading

SwayamInSync commented Jan 21, 2026 •

edited

Loading

SwayamInSync commented Jan 21, 2026 •

edited

Loading

SwayamInSync commented Jan 21, 2026 •

edited

Loading