Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support AVX2 for CPU (drop AVX-512 requirement) #6178

Closed
kozuch opened this issue Jul 6, 2024 · 1 comment
Closed

[Feature]: Support AVX2 for CPU (drop AVX-512 requirement) #6178

kozuch opened this issue Jul 6, 2024 · 1 comment

Comments

@kozuch
Copy link

kozuch commented Jul 6, 2024

🚀 The feature, motivation and pitch

Why is the AVX-512 instruction set required for CPU inference? This limits the CPUs to the more recent models (Intel since 2016, AMD since 2022) - especially the now most affordable first AMD Epyc server CPUs (Zen 1-3 architecture) only have AVX2. Older Epyc processors are nicely cheap and still offer 128 PCI-E lanes for networking.

So if would be nice to expand the CPU support to AVX2 which is the previous generation. Is the implementation difficult? I think llama.cpp supports AVX2 so maybe it could be taken from their code.

Alternatives

No response

Additional context

No response

@mgoin
Copy link
Sponsor Collaborator

mgoin commented Jul 8, 2024

AVX2 machines can build vLLM and is supported for CPU inference as of #5452

However it isn't particularly performant, so contributions are welcome!

@mgoin mgoin closed this as completed Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants