Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : fix arm build #10890

Merged
merged 5 commits into from
Dec 18, 2024
Merged

ggml : fix arm build #10890

merged 5 commits into from
Dec 18, 2024

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Dec 18, 2024

  • Uses -mcpu=native with GGML_NATIVE for arm, but additionally check for dotprod and i8mm because not every compiler enables them
  • Adds option GGML_CPU_ARM_ARCH that can be used to specify the architecture when GGML_NATIVE is disabled
  • Removes GGML_SVE, use -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.6-a+sve for the same effect
  • Adds flags for i8mm and dotprod to ggml_backend_cpu_get_features
  • Removes MSVC support for ARM, use clang instead

Supersedes #10752

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 18, 2024
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good:

  • M1 Pro
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD - Success
-- Performing Test GGML_COMPILER_SUPPORT_I8MM
-- Performing Test GGML_COMPILER_SUPPORT_I8MM - Failed
-- ARM feature DOTPROD enabled
-- ARM feature FMA enabled
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
-- Adding CPU backend variant ggml-cpu: -march=native+dotprod __ARM_FEATURE_DOTPROD
  • M2 Ultra
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD
-- Performing Test GGML_COMPILER_SUPPORT_DOTPROD - Success
-- Performing Test GGML_COMPILER_SUPPORT_I8MM
-- Performing Test GGML_COMPILER_SUPPORT_I8MM - Success
-- ARM feature DOTPROD enabled
-- ARM feature MATMUL_INT8 enabled
-- ARM feature FMA enabled
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
-- Adding CPU backend variant ggml-cpu: -march=native+dotprod+i8mm __ARM_FEATURE_DOTPROD;__ARM_FEATURE_MATMUL_INT8

@slaren
Copy link
Collaborator Author

slaren commented Dec 18, 2024

Note: llamafile sgemm uses vld1q_f16 and vld1_f16 without checking for __ARM_FEATURE_FP16_VECTOR_ARITHMETIC, which causes the build to fail when this feature is not enabled. Previously, there was a hack to enable this feature always when building for Android armv7, which has now been removed. I have disabled llamafile in the android example until this is fixed.

@github-actions github-actions bot added android Issues specific to Android examples labels Dec 18, 2024
@angt
Copy link
Contributor

angt commented Dec 18, 2024

Just for information, that way we need to check all required features as -march is really generic.
As an example on my M3, -mcpu=native enable:

$ clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '\-target-feature [^ ]+'
-target-feature +v8.5a
-target-feature +aes
-target-feature +crc
-target-feature +dotprod
-target-feature +fp-armv8
-target-feature +fp16fml
-target-feature +lse
-target-feature +ras
-target-feature +rcpc
-target-feature +rdm
-target-feature +sha2
-target-feature +sha3
-target-feature +neon
-target-feature +zcm
-target-feature +zcz
-target-feature +fullfp16

while -march=native only enable:

$ clang -march=native -dM -E -v - </dev/null 2>&1 | grep -oE '\-target-feature [^ ]+'
-target-feature +neon
-target-feature +v8.5a
-target-feature +zcm
-target-feature +zcz

@slaren
Copy link
Collaborator Author

slaren commented Dec 18, 2024

Ok, it is using -mcpu now. I think it should be good as is for modern platforms at least, but we might need to add another option to set -mfpu as well.

@angt
Copy link
Contributor

angt commented Dec 18, 2024

Ok, it is using -mcpu now. I think it should be good as is for modern platforms at least, but we might need to add another option to set -mfpu as well.

And many other flags too, especially for cross-compilation, that's why I think we should have a generic flag setting when GGML_NATIVE=OFF and let llama.cpp, whisper.ccp set it for ggml.

@slaren
Copy link
Collaborator Author

slaren commented Dec 18, 2024

Which flags are you thinking about?

@slaren slaren merged commit 9177484 into master Dec 18, 2024
55 of 56 checks passed
@slaren slaren deleted the sl/fix-arm-build branch December 18, 2024 22:21
@angt
Copy link
Contributor

angt commented Dec 18, 2024

Which flags are you thinking about?

Sometimes we need to force -mfloat-abi if it's not forced by the toolchain (like gnuabihf). The -mno-unaligned-access was also used in the removed code.

Cross-compiling is always complicated anyway, some iterations will be needed so having a generic flag could help.

@slaren
Copy link
Collaborator Author

slaren commented Dec 18, 2024

It might make more sense to pass these kind of flags with CMAKE_C_FLAGS and CMAKE_CXX_FLAGS, since you would probably want to apply them to the entire program, not just to the CPU backend. The main reason of having GGML_CPU_ARM_ARCH is to allow in the future building multiple versions of the CPU backend with different arch flags, so that we can distribute a single binary that can automatically choose which version of the CPU backend to load at runtime, in a similar way that it is already done for x86. When these flags were added to the CMakeLists.txt, the CPU backend was not built as a separate target, it was a monolithic build with a single target for the entire ggml library, so the flags would apply to everything.

@ajiekc905
Copy link

On Android termux Snapdragon 8 gen 1 gives
clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '-target-feature [^ ]+'
-target-feature +v9a
-target-feature +am
-target-feature +bf16
-target-feature +ccidx
-target-feature +complxnum
-target-feature +crc
-target-feature +dotprod
-target-feature +ete
-target-feature +flagm
-target-feature +fp-armv8
-target-feature +fp16fml
-target-feature +fullfp16
-target-feature +i8mm
-target-feature +jsconv
-target-feature +lse
-target-feature +mte
-target-feature +neon
-target-feature +pauth
-target-feature +perfmon
-target-feature +ras
-target-feature +rcpc
-target-feature +rdm
-target-feature +sb
-target-feature +ssbs
-target-feature +sve
-target-feature +sve2
-target-feature +sve2-bitperm
-target-feature +trbe
-target-feature +fix-cortex-a53-835769
-target-feature +outline-atomics

With this commit it produces unusable binaries.
Illegal instruction

cat /proc/cpuinfo
processor : 0
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 1
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 2
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 3
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd46
CPU revision : 2

processor : 4
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 5
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 6
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd47
CPU revision : 0

processor : 7
BogoMIPS : 38.40
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd48
CPU revision : 0

@Jimex
Copy link

Jimex commented Dec 19, 2024

I encounterred the error "Failed to get ARM features" when I tried to compile the Android example with this build.

@gustrd
Copy link
Contributor

gustrd commented Dec 19, 2024

#On Android termux Snapdragon 8 gen 1 gives
#clang -mcpu=native -dM -E -v - </dev/null 2>&1 | grep -oE '-target-feature [^ ]+'

With Snapdragon 8 gen 1 I can only build with -march=native+nosve . Seems like it's sve implementation is bad.

https://x.com/never_released/status/1628885404785991683?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1629432494532509697%7Ctwgr%5E73eda2d668d483f5440a37d5d4b204b45b2a1fcb%7Ctwcon%5Es3_&ref_url=https%3A%2F%2Fforums.anandtech.com%2Fthreads%2Fqualcomm-snapdragon-thread.2616013%2Fpage-4

@slaren
Copy link
Collaborator Author

slaren commented Dec 19, 2024

Maybe we could add more tests for features (SVE), and disable them explicitly with the no flag if the test fails.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* ggml: GGML_NATIVE uses -mcpu=native on ARM

Signed-off-by: Adrien Gallouët <[email protected]>

* ggml: Show detected features with GGML_NATIVE

Signed-off-by: Adrien Gallouët <[email protected]>

* remove msvc support, add GGML_CPU_ARM_ARCH option

* disable llamafile in android example

* march -> mcpu, skip adding feature macros

ggml-ci

---------

Signed-off-by: Adrien Gallouët <[email protected]>
Co-authored-by: Adrien Gallouët <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
android Issues specific to Android examples ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: Q4_0 with runtime repacking not working as expected (TYPE_Q4_0_4_4 REMOVED)
6 participants