[Feature]: Support AVX2 for CPU (drop AVX-512 requirement) #6178

kozuch · 2024-07-06T15:12:59Z

🚀 The feature, motivation and pitch

Why is the AVX-512 instruction set required for CPU inference? This limits the CPUs to the more recent models (Intel since 2016, AMD since 2022) - especially the now most affordable first AMD Epyc server CPUs (Zen 1-3 architecture) only have AVX2. Older Epyc processors are nicely cheap and still offer 128 PCI-E lanes for networking.

So if would be nice to expand the CPU support to AVX2 which is the previous generation. Is the implementation difficult? I think llama.cpp supports AVX2 so maybe it could be taken from their code.

Alternatives

No response

Additional context

No response

mgoin · 2024-07-08T21:05:25Z

AVX2 machines can build vLLM and is supported for CPU inference as of #5452

However it isn't particularly performant, so contributions are welcome!

kozuch added the feature request label Jul 6, 2024

mgoin closed this as completed Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support AVX2 for CPU (drop AVX-512 requirement) #6178

[Feature]: Support AVX2 for CPU (drop AVX-512 requirement) #6178

kozuch commented Jul 6, 2024

mgoin commented Jul 8, 2024 •

edited

Loading

[Feature]: Support AVX2 for CPU (drop AVX-512 requirement) #6178

[Feature]: Support AVX2 for CPU (drop AVX-512 requirement) #6178

Comments

kozuch commented Jul 6, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

mgoin commented Jul 8, 2024 • edited Loading

mgoin commented Jul 8, 2024 •

edited

Loading