ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() #10874

angt · 2024-12-17T20:47:40Z

Same output and same perf:

| Model         |   Threads | Test   |   t/s master |   t/s ggml-cpu-replace-neon-asm-with-intrinsics-in-ggml_gemv_q4_0_4x8_q8_0 |   Speedup |
|:--------------|----------:|:-------|-------------:|---------------------------------------------------------------------------:|----------:|
| llama 1B Q4_0 |         2 | pp512  |       155.22 |                                                                     156.98 |      1.01 |
| llama 1B Q4_0 |         2 | tg128  |        33.96 |                                                                      33.65 |      0.99 |
| llama 1B Q4_0 |         4 | pp512  |       312.66 |                                                                     313.25 |      1.00 |
| llama 1B Q4_0 |         4 | tg128  |        60.70 |                                                                      60.31 |      0.99 |
| llama 1B Q4_0 |         8 | pp512  |       585.92 |                                                                     578.52 |      0.99 |
| llama 1B Q4_0 |         8 | tg128  |        93.77 |                                                                      94.56 |      1.01 |
| llama 1B Q4_0 |        16 | pp512  |       966.15 |                                                                     952.94 |      0.99 |
| llama 1B Q4_0 |        16 | tg128  |        67.38 |                                                                      68.80 |      1.02 |
| llama 3B Q4_0 |         2 | pp512  |        58.62 |                                                                      58.69 |      1.00 |
| llama 3B Q4_0 |         2 | tg128  |        15.16 |                                                                      14.72 |      0.97 |
| llama 3B Q4_0 |         4 | pp512  |       118.17 |                                                                     117.87 |      1.00 |
| llama 3B Q4_0 |         4 | tg128  |        26.93 |                                                                      27.22 |      1.01 |
| llama 3B Q4_0 |         8 | pp512  |       221.48 |                                                                     221.54 |      1.00 |
| llama 3B Q4_0 |         8 | tg128  |        42.32 |                                                                      44.05 |      1.04 |
| llama 3B Q4_0 |        16 | pp512  |       360.97 |                                                                     363.18 |      1.01 |
| llama 3B Q4_0 |        16 | tg128  |        32.99 |                                                                      32.78 |      0.99 |

Signed-off-by: Adrien Gallouët <[email protected]>

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 17, 2024

angt added 2 commits December 19, 2024 09:43

ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0()

82222b7

Signed-off-by: Adrien Gallouët <[email protected]>

ggml-cpu: format code

2cfbccd

Signed-off-by: Adrien Gallouët <[email protected]>

angt force-pushed the ggml-cpu-replace-neon-asm-with-intrinsics-in-ggml_gemv_q4_0_4x8_q8_0 branch from b283fc3 to 2cfbccd Compare December 19, 2024 09:44

slaren approved these changes Dec 20, 2024

View reviewed changes

slaren merged commit e34c5af into ggerganov:master Dec 20, 2024
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() #10874

ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() #10874

angt commented Dec 17, 2024 •

edited

Loading

ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() #10874

ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() #10874

Conversation

angt commented Dec 17, 2024 • edited Loading

angt commented Dec 17, 2024 •

edited

Loading