-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Added support for QMX kernels in MLAS #26849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Supported operations: SGEMM, QGEMM, Convolution
|
@microsoft-github-policy-service agree company="Qualcomm" |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
* Corrected logs for QMX kernel
* Simplified logic to check SME2 or QMX kernel is available for
dynamic QGEMM
* Restored preprocessor gaurd check for MSVC compiler
* QGEMM - enable QMX kernel on Qualcomm SME1 devices only
d7045d6 to
b5bb952
Compare
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for Qualcomm QMX kernels to MLAS (Microsoft Linear Algebra Subprograms), enabling QMX-optimized implementations of SGEMM, QGEMM, and Convolution operations to coexist with ARM KleidiAI kernels. The implementation selects the appropriate kernel at runtime based on CPU vendor detection, falling back to standard SME kernels when QMX support is not enabled or when running on non-Qualcomm platforms.
Changes:
- Added build configuration option
--use_qmxto enable QMX kernel integration - Extended KleidiAI integration to support runtime selection between QMX, SME2, and SME kernels based on vendor and hardware capabilities
- Updated dynamic quantized GEMM availability check to include SME support alongside SME2
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/ci_build/build_args.py | Adds --use_qmx command-line argument to enable Qualcomm QMX kernel support |
| tools/ci_build/build.py | Passes the QMX flag to CMake when enabled and KleidiAI is not disabled |
| onnxruntime/core/mlas/lib/qgemm.cpp | Updates dynamic QGEMM availability check to support both SME2 and SME architectures |
| onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp | Includes QMX kernel header for SGEMM operations when enabled |
| onnxruntime/core/mlas/lib/kleidiai/qgemm_kleidiai.cpp | Adds runtime vendor detection and QMX kernel selection for quantized GEMM operations |
| onnxruntime/core/mlas/lib/kleidiai/mlasi_kleidiai.h | Adds CPU vendor name detection and SME capability flags |
| onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp | Implements QMX kernel support for convolution operations with vendor-based selection |
| onnxruntime/core/mlas/lib/kai_ukernel_interface.cpp | Defines QMX kernel interface and adds vendor-based kernel selection logic for SGEMM |
| onnxruntime/core/mlas/inc/mlas.h | Removes deprecated function declaration that's no longer used |
| cmake/onnxruntime_mlas.cmake | Configures linking and installation of kleidiai-qmx library when QMX support is enabled |
| cmake/external/onnxruntime_external_deps.cmake | Fetches the kleidiai-qmx library from Qualcomm's repository when enabled |
| cmake/deps.txt | Adds kleidiai-qmx dependency with SHA hash for version pinning |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
I have addressed the review comments. Thanks ! |
config option onnxruntime_USE_QMX_KLEIDIAI_COEXIST
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Overall LGTM - thanks. There are some suggestions by Copilot and there is one minor formating comment by me. Can you please see if they are relevant ? Thanks. |
* Addressed Copilot review comments
Hi @hariharans29 - Thanks for helping with the review. I have addressed the code indentation issues and few of the copilot comments as well. The unaddressed comments from copilot, I don't think are very much relevant :) |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
Hello @qti-vaiskv - Can you please resolve the conflicts ? I ll merge this PR next. Thanks. |
8caa289 to
a4610b4
Compare
a4610b4 to
b79dad4
Compare
|
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
Supported operations with QMX: SGEMM, QGEMM, Convolution