Skip to content

RX 470 Vulkan Benchmarks #3

@kth8

Description

@kth8

This isn't Raspberry Pi/ARM related but seeing this reminded me of some RX 470s I have sitting a bin. I bought them about 8 years ago for ETH mining but had since written them off as e-waste. Following the blog instructions, I managed to run the Llama-3.2-3B-Instruct-Q4_K_M model and my RX 470 is able to get about 20 token/s, half of the RX6500 XT from the Pi benchmarks.

I decided to pair the GPU with an even older Intel Ivy bridge CPU. One problem I ran into is when llama.cpp is compiled on a modern system then transferred over, it would lead to Illegal instruction (core dumped) due to the Ivy bridge CPU being so old and missing many CPU extensions. It took some trial and error to find the right compile options but I eventually got it working so I decided to publish a Docker image so I can easily deploy it with 1 command: https://github.com/kth8/llama-server-vulkan

Here are the benchmarks using llama-bench:

model size params backend ngl test t/s
llama 1B Q4_K - Medium 762.81 MiB 1.24 B Vulkan 99 pp512 353.81 ± 0.19
llama 1B Q4_K - Medium 762.81 MiB 1.24 B Vulkan 99 pp4096 527.48 ± 0.22
llama 1B Q4_K - Medium 762.81 MiB 1.24 B Vulkan 99 tg128 60.83 ± 0.07
llama 1B Q4_K - Medium 762.81 MiB 1.24 B Vulkan 99 pp4096+tg128 375.29 ± 0.24
model size params backend ngl test t/s
llama 3B Q4_K - Medium 1.87 GiB 3.21 B Vulkan 99 pp512 203.06 ± 0.63
llama 3B Q4_K - Medium 1.87 GiB 3.21 B Vulkan 99 pp4096 179.68 ± 0.28
llama 3B Q4_K - Medium 1.87 GiB 3.21 B Vulkan 99 tg128 25.65 ± 0.15
llama 3B Q4_K - Medium 1.87 GiB 3.21 B Vulkan 99 pp4096+tg128 123.08 ± 0.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions