-
Notifications
You must be signed in to change notification settings - Fork 14
Description
This isn't Raspberry Pi/ARM related but seeing this reminded me of some RX 470s I have sitting a bin. I bought them about 8 years ago for ETH mining but had since written them off as e-waste. Following the blog instructions, I managed to run the Llama-3.2-3B-Instruct-Q4_K_M model and my RX 470 is able to get about 20 token/s, half of the RX6500 XT from the Pi benchmarks.
I decided to pair the GPU with an even older Intel Ivy bridge CPU. One problem I ran into is when llama.cpp is compiled on a modern system then transferred over, it would lead to Illegal instruction (core dumped) due to the Ivy bridge CPU being so old and missing many CPU extensions. It took some trial and error to find the right compile options but I eventually got it working so I decided to publish a Docker image so I can easily deploy it with 1 command: https://github.com/kth8/llama-server-vulkan
Here are the benchmarks using llama-bench:
| model | size | params | backend | ngl | test | t/s |
|---|---|---|---|---|---|---|
| llama 1B Q4_K - Medium | 762.81 MiB | 1.24 B | Vulkan | 99 | pp512 | 353.81 ± 0.19 |
| llama 1B Q4_K - Medium | 762.81 MiB | 1.24 B | Vulkan | 99 | pp4096 | 527.48 ± 0.22 |
| llama 1B Q4_K - Medium | 762.81 MiB | 1.24 B | Vulkan | 99 | tg128 | 60.83 ± 0.07 |
| llama 1B Q4_K - Medium | 762.81 MiB | 1.24 B | Vulkan | 99 | pp4096+tg128 | 375.29 ± 0.24 |
| model | size | params | backend | ngl | test | t/s |
|---|---|---|---|---|---|---|
| llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | Vulkan | 99 | pp512 | 203.06 ± 0.63 |
| llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | Vulkan | 99 | pp4096 | 179.68 ± 0.28 |
| llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | Vulkan | 99 | tg128 | 25.65 ± 0.15 |
| llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | Vulkan | 99 | pp4096+tg128 | 123.08 ± 0.11 |