-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Intel VNNI for int dot product #3512
Comments
That requires access to test machines which support those instructions. Up to now I don't have any server with AVX512. |
This will be available soon. Maybe we could ask users to help testing this feature a few months from the launch. |
@stweil, can I add |
Sure, but who has such hardware to test it? |
There are tens of millions of people with Intel's Alder Lake, some of them are Tesseract users. We can ask in the forum to test the detection (and later the intdotproductvnni). Hopefully, we will find at least one person that has this CPU and is willing to help. |
@amitdo, I just noticed that the notebook which I used for AVX512F also has AVX512VNNI. :-) |
Go ahead! Could you do the Since most of our files already have:
I think we can look at other Google projects with the same license as ours, and use parts of the code if we need it. |
Detection is now implemented by commit 0daf18c. |
I see that you check that avx/avx2 is supported by the OS. Do you also check somewhere that avx512 is supported by the OS? |
No, currently only the hardware capabilities are checked for avx512. Up to now nobody complained, so maybe AVX512F was only used on operating systems which support it. I'll add a check for OS support. Thank you for the hint! |
I will try to implement |
Great. Maybe you can use https://github.com/stweil/tesseract/tree/avx512-vnni (which adds the framework, but simply copied the existing AVX2 code) as a starting point. |
Yes, thank you. Please open a draft PR with that code. I'll push the needed changes to your PR. |
See PR #3894. |
Stefan, There are two ways to implement
I want to implement the second way in PR #3894. We can still implement the first way later. What do you think about my suggestion? |
|
Fixed :-) |
There are two variants:
VNNI replaces 3 simd instructions with one instruction.
It seems that we can use it inside
MultiplyGroup()
.https://software.intel.com/content/www/us/en/develop/articles/intel-advanced-vector-extensions-512-intel-avx-512-new-vector-neural-network-instruction.html
The text was updated successfully, but these errors were encountered: