Replies: 2 comments 1 reply
-
[edit: 2024/12/09]: OK some correction and more tuning on ryzen 5950x (zen3) I get (from llama.cpp code):
Not the best that we can have with this CPU but we may need a true BLIS kernel for best (I think we can have ~80 t/s) on AMD Ryzen™ 9 7940HS (zen4)
|
Beta Was this translation helpful? Give feedback.
0 replies
-
@jart do you want I try it on llamafile ? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
ikawrakow/ik_llama.cpp#71 have a good idea.
I'll figure to add it in tinyblas and id work great. (and I add quant in FP16/BF16 in all case for B to reduce memory bandwidth. work nice for AVX512/AVX2 kernel)
https://github.com/Djip007/llama.cpp/blob/perfo/tinyblas/ggml/src/ggml-cpu/llamafile/sgemm.cpp#L297
Beta Was this translation helpful? Give feedback.
All reactions