ggml : fix arm build #10890

slaren · 2024-12-18T18:42:11Z

Uses -mcpu=native with GGML_NATIVE for arm, but additionally check for dotprod and i8mm because not every compiler enables them
Adds option GGML_CPU_ARM_ARCH that can be used to specify the architecture when GGML_NATIVE is disabled
Removes GGML_SVE, use -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.6-a+sve for the same effect
Adds flags for i8mm and dotprod to ggml_backend_cpu_get_features
Removes MSVC support for ARM, use clang instead

Supersedes #10752

Signed-off-by: Adrien Gallouët <[email protected]>

ggerganov

Looks good:

M1 Pro

-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD - Success
-- Performing Test GGML_COMPILER_SUPPORT_I8MM
-- Performing Test GGML_COMPILER_SUPPORT_I8MM - Failed
-- ARM feature DOTPROD enabled
-- ARM feature FMA enabled
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
-- Adding CPU backend variant ggml-cpu: -march=native+dotprod __ARM_FEATURE_DOTPROD

M2 Ultra

-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD - Success
-- Performing Test GGML_COMPILER_SUPPORT_I8MM
-- Performing Test GGML_COMPILER_SUPPORT_I8MM - Success
-- ARM feature DOTPROD enabled
-- ARM feature MATMUL_INT8 enabled
-- ARM feature FMA enabled
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
-- Adding CPU backend variant ggml-cpu: -march=native+dotprod+i8mm __ARM_FEATURE_DOTPROD;__ARM_FEATURE_MATMUL_INT8

ggml/src/ggml-cpu/CMakeLists.txt

slaren · 2024-12-18T19:27:26Z

Note: llamafile sgemm uses vld1q_f16 and vld1_f16 without checking for __ARM_FEATURE_FP16_VECTOR_ARITHMETIC, which causes the build to fail when this feature is not enabled. Previously, there was a hack to enable this feature always when building for Android armv7, which has now been removed. I have disabled llamafile in the android example until this is fixed.

ggml-ci

angt · 2024-12-18T19:56:59Z

Just for information, that way we need to check all required features as -march is really generic.
As an example on my M3, -mcpu=native enable:

$ clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '\-target-feature [^ ]+'
-target-feature +v8.5a
-target-feature +aes
-target-feature +crc
-target-feature +dotprod
-target-feature +fp-armv8
-target-feature +fp16fml
-target-feature +lse
-target-feature +ras
-target-feature +rcpc
-target-feature +rdm
-target-feature +sha2
-target-feature +sha3
-target-feature +neon
-target-feature +zcm
-target-feature +zcz
-target-feature +fullfp16

while -march=native only enable:

$ clang -march=native -dM -E -v - </dev/null 2>&1 | grep -oE '\-target-feature [^ ]+'
-target-feature +neon
-target-feature +v8.5a
-target-feature +zcm
-target-feature +zcz

slaren · 2024-12-18T20:52:16Z

Ok, it is using -mcpu now. I think it should be good as is for modern platforms at least, but we might need to add another option to set -mfpu as well.

angt · 2024-12-18T21:06:47Z

Ok, it is using -mcpu now. I think it should be good as is for modern platforms at least, but we might need to add another option to set -mfpu as well.

And many other flags too, especially for cross-compilation, that's why I think we should have a generic flag setting when GGML_NATIVE=OFF and let llama.cpp, whisper.ccp set it for ggml.

slaren · 2024-12-18T21:14:27Z

Which flags are you thinking about?

angt · 2024-12-18T23:02:22Z

Which flags are you thinking about?

Sometimes we need to force -mfloat-abi if it's not forced by the toolchain (like gnuabihf). The -mno-unaligned-access was also used in the removed code.

Cross-compiling is always complicated anyway, some iterations will be needed so having a generic flag could help.

slaren · 2024-12-18T23:22:10Z

It might make more sense to pass these kind of flags with CMAKE_C_FLAGS and CMAKE_CXX_FLAGS, since you would probably want to apply them to the entire program, not just to the CPU backend. The main reason of having GGML_CPU_ARM_ARCH is to allow in the future building multiple versions of the CPU backend with different arch flags, so that we can distribute a single binary that can automatically choose which version of the CPU backend to load at runtime, in a similar way that it is already done for x86. When these flags were added to the CMakeLists.txt, the CPU backend was not built as a separate target, it was a monolithic build with a single target for the entire ggml library, so the flags would apply to everything.

ajiekc905 · 2024-12-19T12:27:14Z

On Android termux Snapdragon 8 gen 1 gives
clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '-target-feature [^ ]+'
-target-feature +v9a
-target-feature +am
-target-feature +bf16
-target-feature +ccidx
-target-feature +complxnum
-target-feature +crc
-target-feature +dotprod
-target-feature +ete
-target-feature +flagm
-target-feature +fp-armv8
-target-feature +fp16fml
-target-feature +fullfp16
-target-feature +i8mm
-target-feature +jsconv
-target-feature +lse
-target-feature +mte
-target-feature +neon
-target-feature +pauth
-target-feature +perfmon
-target-feature +ras
-target-feature +rcpc
-target-feature +rdm
-target-feature +sb
-target-feature +ssbs
-target-feature +sve
-target-feature +sve2
-target-feature +sve2-bitperm
-target-feature +trbe
-target-feature +fix-cortex-a53-835769
-target-feature +outline-atomics

With this commit it produces unusable binaries.
Illegal instruction

cat /proc/cpuinfo
processor : 0
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 1
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 2
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 3
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 4
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 5
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 6
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 7
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd48
CPU revision : 0

Jimex · 2024-12-19T13:14:59Z

I encounterred the error "Failed to get ARM features" when I tried to compile the Android example with this build.

gustrd · 2024-12-19T15:07:46Z

#On Android termux Snapdragon 8 gen 1 gives
#clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '-target-feature [^ ]+'

With Snapdragon 8 gen 1 I can only build with -march=native+nosve . Seems like it's sve implementation is bad.

https://x.com/never_released/status/1628885404785991683?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1629432494532509697%7Ctwgr%5E73eda2d668d483f5440a37d5d4b204b45b2a1fcb%7Ctwcon%5Es3_&ref_url=https%3A%2F%2Fforums.anandtech.com%2Fthreads%2Fqualcomm-snapdragon-thread.2616013%2Fpage-4

slaren · 2024-12-19T15:27:32Z

Maybe we could add more tests for features (SVE), and disable them explicitly with the no flag if the test fails.

* ggml: GGML_NATIVE uses -mcpu=native on ARM Signed-off-by: Adrien Gallouët <[email protected]> * ggml: Show detected features with GGML_NATIVE Signed-off-by: Adrien Gallouët <[email protected]> * remove msvc support, add GGML_CPU_ARM_ARCH option * disable llamafile in android example * march -> mcpu, skip adding feature macros ggml-ci --------- Signed-off-by: Adrien Gallouët <[email protected]> Co-authored-by: Adrien Gallouët <[email protected]>

angt and others added 3 commits December 18, 2024 08:13

ggml: GGML_NATIVE uses -mcpu=native on ARM

7eb81e1

Signed-off-by: Adrien Gallouët <[email protected]>

ggml: Show detected features with GGML_NATIVE

1dae1d8

Signed-off-by: Adrien Gallouët <[email protected]>

remove msvc support, add GGML_CPU_ARM_ARCH option

6f6794f

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 18, 2024

slaren mentioned this pull request Dec 18, 2024

ggml: GGML_NATIVE uses -mcpu=native on ARM #10752

Closed

ggerganov approved these changes Dec 18, 2024

View reviewed changes

angt reviewed Dec 18, 2024

View reviewed changes

ggml/src/ggml-cpu/CMakeLists.txt Outdated Show resolved Hide resolved

disable llamafile in android example

0a4b79c

github-actions bot added android Issues specific to Android examples labels Dec 18, 2024

march -> mcpu, skip adding feature macros

d4b1259

ggml-ci

slaren linked an issue Dec 18, 2024 that may be closed by this pull request

Misc. bug: Q4_0 with runtime repacking not working as expected (TYPE_Q4_0_4_4 REMOVED) #10757

Closed

slaren merged commit 9177484 into master Dec 18, 2024
55 of 56 checks passed

slaren deleted the sl/fix-arm-build branch December 18, 2024 22:21

ekcrisp mentioned this pull request Dec 18, 2024

Add an option to enable --runtime-repack in llama.cpp abetlen/llama-cpp-python#1860

Closed

ag2s20150909 mentioned this pull request Dec 23, 2024

Compile bug: Android example compilation failure after commit b4357. #10952

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : fix arm build #10890

ggml : fix arm build #10890

slaren commented Dec 18, 2024 •

edited

Loading

ggerganov left a comment

slaren commented Dec 18, 2024

angt commented Dec 18, 2024

slaren commented Dec 18, 2024

angt commented Dec 18, 2024

slaren commented Dec 18, 2024

angt commented Dec 18, 2024

slaren commented Dec 18, 2024

ajiekc905 commented Dec 19, 2024

Jimex commented Dec 19, 2024 •

edited

Loading

gustrd commented Dec 19, 2024

slaren commented Dec 19, 2024

ggml : fix arm build #10890

ggml : fix arm build #10890

Conversation

slaren commented Dec 18, 2024 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

slaren commented Dec 18, 2024

angt commented Dec 18, 2024

slaren commented Dec 18, 2024

angt commented Dec 18, 2024

slaren commented Dec 18, 2024

angt commented Dec 18, 2024

slaren commented Dec 18, 2024

ajiekc905 commented Dec 19, 2024

Jimex commented Dec 19, 2024 • edited Loading

gustrd commented Dec 19, 2024

slaren commented Dec 19, 2024

slaren commented Dec 18, 2024 •

edited

Loading

Jimex commented Dec 19, 2024 •

edited

Loading