-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The performance curve of parallel GEMM with many cores shows significant up-down #5172
Comments
Thanks. You are probably aware of yamazakimitsufumi's previous work in #4655 ? |
Yes, I know about yamazakimitsufumi's work #4655.
Thanks for your notice. I have been aware of the fixes in #5133 and #4920 from this work. I have been surprised to hear #4920 had an impact at IBM. Once my modify is ready, it might also need to be reviewed by IBM as well. |
involved data size is 12MB, likely some cache is exceeded. |
When we measured the performance of SGEMM on ARM Neoverse V1 with 64 cores,
the performance values were not smooth and significant up-down, as shown in the following graph.
For example, the performance drops about 35% at m=n=k=1100, compared to the points before and after.
The significant fluctuations in performance values depend on the number of parallel threads and data size,
leading me to believe that the issue causes in the thread partitioning control. I am planning to modify it.
The text was updated successfully, but these errors were encountered: