Replies: 29 comments 34 replies
-
Some more on Ryzen 7940HS (on Framwork 16) + 64Go of RAM / Linux (fc40)
|
Beta Was this translation helpful? Give feedback.
-
Wow thank you for posting these numbers. Your AMD CPUs are a lot cheaper than the Intel Core i9-14900K yet have such better Q6_K and F16 performance. On your znver4 CPU you may want to try using BF16 weights and see how those go, since it should have special opcodes that make those weights go fast. |
Beta Was this translation helpful? Give feedback.
-
Wait a minute. Hold on a second. Am I correct in understanding you ran Mixtral 8x22b on a $362 CPU? And it processed your prompt at 20 tokens per second?! That's nuts. Here's what I get with that model.
|
Beta Was this translation helpful? Give feedback.
-
Does 't/s' refer to the total computation time for inference? |
Beta Was this translation helpful? Give feedback.
-
I really like to know what we can get with this CPU.
expect more than 1/2 compare with 7995WX on pp and same on tg. |
Beta Was this translation helpful? Give feedback.
-
What memory speed were these benchmarks run at for mixtral 8x22b Q6_K?
I have the same machine, Ryzen 9 5950x with 128GB 3600mhz CL16 memory And got the following results for the same tests:
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Can you guys explain what does the test column mean? |
Beta Was this translation helpful? Give feedback.
-
[fbongiovanni@mel0429 llamafile]$ ./0.8.6/llamafile-bench-0.8.6 -p "256,512,1024" -m "mistral-7b-instruct-v0.2.Q6_K.llamafile"
|
Beta Was this translation helpful? Give feedback.
-
#> mem stock 32Go DDR4@2666
#> mem XMP 32Go DDR4@3200
#> mem XMP 32Go DDR4@3600
|
Beta Was this translation helpful? Give feedback.
-
#> mem stock 32Go DDR4@2400
|
Beta Was this translation helpful? Give feedback.
-
32GB DDR4@3000
|
Beta Was this translation helpful? Give feedback.
-
@Djip007 any substantial performance changes with current llamafile for this model? Edit: found #367 |
Beta Was this translation helpful? Give feedback.
-
last bench is with V0.8.11/12 + CPU ...
|
Beta Was this translation helpful? Give feedback.
-
llamafile 0.8.12+
|
Beta Was this translation helpful? Give feedback.
-
last on for now...
|
Beta Was this translation helpful? Give feedback.
-
@Djip007 maybe adjust the title of this discussion to reflect the contents? |
Beta Was this translation helpful? Give feedback.
-
@jart I find it bonkers that we now can run these big models which is kinda out of reach even with 48GB of VRAM, and have decent (or close to it :-) performance. Very grateful for the work you and the rest of the team do. Hope you have a deep stack of ideas to try out to squeeze even more performance out of consumer hardware. :-) Meta-Llama-3-70B-Instruct.Q6_K Latest crop of AMD offerings (zen5 mobile, Strix Point) appear to top out at 120GBps if matched with LPDDR5X-7500. But desktop Zen 5 (Granite Ridge) isn't out until mid august. (in two weeks). Interesting times ahead. Would be interesting to see to what extent LLM performance scales with memory bandwidth and number of threads. (a 3D plot) |
Beta Was this translation helpful? Give feedback.
-
Benchmarks of recent
|
Beta Was this translation helpful? Give feedback.
-
But the best surprise is Deepseek Coder V2 Lite which is usable on AMD Ryzen AI 9 HX 370 for tab autocompletion model even on 8-bit quantization! Now I understand why they choose MoE with this model.
|
Beta Was this translation helpful? Give feedback.
-
with v0.8.13 now it work!:
=> only wait for FP8 support for this last model! |
Beta Was this translation helpful? Give feedback.
-
Here's some TinyLLaMA benchmarks with 0.8.13 on the flagship Threadripper.
|
Beta Was this translation helpful? Give feedback.
-
Also I know it isn't Ryzen, but here's TinyLLaMA on the Apple M2 Ultra with llamafile 0.8.13.
|
Beta Was this translation helpful? Give feedback.
-
More benchmarks for you all with llamafile 0.8.13
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
i wonder if the new 9950x amd cpu's will improve things with their improved avx512 support, they seem to have improved some of the other AI related benchmarks - https://www.phoronix.com/review/amd-ryzen-9950x-9900x/13 |
Beta Was this translation helpful? Give feedback.
-
Did anyone run these tests on any EPYC setup with 8-channel RAM? |
Beta Was this translation helpful? Give feedback.
-
Don't know if this could help with the conversation, I've recently converted Qwen 2.5 Coder 14B Instruct F16 from Bartowski on hugging face, and I run some benchmark with the tool, I'm running:
antonio@giga-nomic:~/.local/bin/llamafile-0.8.17/bin$ ./llamafile-bench -p "256,512,1024" -m Qwen2.5-Coder-14B-Instruct-f16.llamafile
|
Beta Was this translation helpful? Give feedback.
-
I've tried to run the other benchmark done by @Djip007, to compare the performance with my 5600x to the 5950x,
|
Beta Was this translation helpful? Give feedback.
-
(For "history": bench with llamafile V0.8.6)
Some result with my zen3+128Go of RAM / Linux (fc40)
RAM: DDR4@3600
As you see matmul is memory limited on this CPU (DDR-4 + zne3)
Beta Was this translation helpful? Give feedback.
All reactions