Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() #10874

Conversation

angt
Copy link
Contributor

@angt angt commented Dec 17, 2024

Same output and same perf:

| Model         |   Threads | Test   |   t/s master |   t/s ggml-cpu-replace-neon-asm-with-intrinsics-in-ggml_gemv_q4_0_4x8_q8_0 |   Speedup |
|:--------------|----------:|:-------|-------------:|---------------------------------------------------------------------------:|----------:|
| llama 1B Q4_0 |         2 | pp512  |       155.22 |                                                                     156.98 |      1.01 |
| llama 1B Q4_0 |         2 | tg128  |        33.96 |                                                                      33.65 |      0.99 |
| llama 1B Q4_0 |         4 | pp512  |       312.66 |                                                                     313.25 |      1.00 |
| llama 1B Q4_0 |         4 | tg128  |        60.70 |                                                                      60.31 |      0.99 |
| llama 1B Q4_0 |         8 | pp512  |       585.92 |                                                                     578.52 |      0.99 |
| llama 1B Q4_0 |         8 | tg128  |        93.77 |                                                                      94.56 |      1.01 |
| llama 1B Q4_0 |        16 | pp512  |       966.15 |                                                                     952.94 |      0.99 |
| llama 1B Q4_0 |        16 | tg128  |        67.38 |                                                                      68.80 |      1.02 |
| llama 3B Q4_0 |         2 | pp512  |        58.62 |                                                                      58.69 |      1.00 |
| llama 3B Q4_0 |         2 | tg128  |        15.16 |                                                                      14.72 |      0.97 |
| llama 3B Q4_0 |         4 | pp512  |       118.17 |                                                                     117.87 |      1.00 |
| llama 3B Q4_0 |         4 | tg128  |        26.93 |                                                                      27.22 |      1.01 |
| llama 3B Q4_0 |         8 | pp512  |       221.48 |                                                                     221.54 |      1.00 |
| llama 3B Q4_0 |         8 | tg128  |        42.32 |                                                                      44.05 |      1.04 |
| llama 3B Q4_0 |        16 | pp512  |       360.97 |                                                                     363.18 |      1.01 |
| llama 3B Q4_0 |        16 | tg128  |        32.99 |                                                                      32.78 |      0.99 |

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 17, 2024
@angt angt force-pushed the ggml-cpu-replace-neon-asm-with-intrinsics-in-ggml_gemv_q4_0_4x8_q8_0 branch from b283fc3 to 2cfbccd Compare December 19, 2024 09:44
@slaren slaren merged commit e34c5af into ggerganov:master Dec 20, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants