-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[MLAS/CPU EP]: Introduce a backend kernel selector config in MLAS #27136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
onnxruntime/contrib_ops/cpu/transformers/generation_device_helper.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a backend-kernel selection config that can be plumbed through MLAS/ORT call sites to optionally disable KleidiAI-backed kernels via a new session option.
Changes:
- Add
MLAS_BACKEND_KERNEL_SELECTOR_CONFIGand thread it through MLAS GEMM/QGEMM/DynamicQGEMM APIs (and call sites). - Add session option key
mlas.disable_kleidiaiand propagate it through multiple CPU kernels (RNN/Conv/MatMul/Gemm/Softmax/Einsum/Attention/etc.). - Update unit tests and benchmarks to use the new MLAS API signatures.
Reviewed changes
Copilot reviewed 85 out of 85 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/mlas/unittest/test_qgemm.h | Update MLAS packing/GEMM calls with backend config param. |
| onnxruntime/test/mlas/unittest/test_qgemm.cpp | Update pack-size queries with backend config param. |
| onnxruntime/test/mlas/unittest/test_fgemm.h | Update batch GEMM + pack APIs with backend config param. |
| onnxruntime/test/mlas/unittest/test_dynamic_qgemm.cpp | Update dynamic QGEMM APIs + availability check signature. |
| onnxruntime/test/mlas/unittest/test_conv2d.h | Update MLAS GEMM call signature. |
| onnxruntime/test/mlas/bench/bench_sgemm.cpp | Update GEMM/pack benchmark calls for new signature. |
| onnxruntime/test/mlas/bench/bench_qgemm.cpp | Update QGEMM pack-size query signature. |
| onnxruntime/test/framework/math_test.cc | Update math::Gemm calls to pass backend config param. |
| onnxruntime/core/util/math_cpu.cc | Extend math APIs to accept backend selector config and forward to MLAS. |
| onnxruntime/core/util/math.h | Extend math API declarations and include MLAS config type. |
| onnxruntime/core/providers/cpu/rnn/uni_directional_lstm.h | Store and plumb backend selector config through LSTM implementation. |
| onnxruntime/core/providers/cpu/rnn/uni_directional_lstm.cc | Forward backend selector config into GEMM calls. |
| onnxruntime/core/providers/cpu/rnn/rnn_helpers.h | Extend helper GEMM wrappers to accept backend selector config. |
| onnxruntime/core/providers/cpu/rnn/rnn_helpers.cc | Forward backend selector config into math::Gemm/GemmEx. |
| onnxruntime/core/providers/cpu/rnn/rnn.h | Add session-option-driven config to RNN op kernel. |
| onnxruntime/core/providers/cpu/rnn/rnn.cc | Pass backend selector config into MatMul calls. |
| onnxruntime/core/providers/cpu/rnn/lstm_base.h | Add session-option-driven config to LSTM base. |
| onnxruntime/core/providers/cpu/rnn/lstm_base.cc | Pass backend selector config to UniDirectionalLstm. |
| onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h | Formatting-only constructor change. |
| onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc | Pass backend selector config into pack-size/pack calls. |
| onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h | Add session-option-driven config; plumb into GRU implementation. |
| onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc | Pass backend selector config through packing and GEMM paths. |
| onnxruntime/core/providers/cpu/reduction/reduction_ops.h | Update MatMul call signature and add TODO for config plumbing. |
| onnxruntime/core/providers/cpu/quantization/qlinearconv.cc | Add session-option-driven config and pass to pack-size call. |
| onnxruntime/core/providers/cpu/quantization/matmul_integer_base.h | Introduce backend selector config usage in packing/dynamic quant paths. |
| onnxruntime/core/providers/cpu/nn/conv_transpose.h | Add session-option-driven config to ConvTranspose. |
| onnxruntime/core/providers/cpu/nn/conv_transpose.cc | Pass backend selector config into MatMul call. |
| onnxruntime/core/providers/cpu/nn/conv.h | Add session-option-driven config to Conv. |
| onnxruntime/core/providers/cpu/nn/conv.cc | Attach backend selector config into MLAS conv parameters and GEMM path. |
| onnxruntime/core/providers/cpu/ml/svmregressor.cc | Pass backend selector config into GEMM path. |
| onnxruntime/core/providers/cpu/ml/svmclassifier.h | Add session-option-driven config and pass to GEMM path. |
| onnxruntime/core/providers/cpu/ml/linearregressor.h | Store backend selector config for LinearRegressor. |
| onnxruntime/core/providers/cpu/ml/linearregressor.cc | Read session option and forward backend selector config into GEMM calls. |
| onnxruntime/core/providers/cpu/ml/linearclassifier.h | Store backend selector config for LinearClassifier. |
| onnxruntime/core/providers/cpu/ml/linearclassifier.cc | Read session option and forward backend selector config into GEMM calls. |
| onnxruntime/core/providers/cpu/math/softmax_shared.h | Extend SoftmaxCPU signature to accept backend selector config. |
| onnxruntime/core/providers/cpu/math/softmax_shared.cc | Forward config into math::Gemm where applicable; ignore in float fast path. |
| onnxruntime/core/providers/cpu/math/softmax.h | Add session-option-driven config and plumb through compute paths. |
| onnxruntime/core/providers/cpu/math/softmax.cc | Pass backend selector config into SoftmaxCPU calls. |
| onnxruntime/core/providers/cpu/math/matmul.h | Add session-option-driven config to MatMul kernels. |
| onnxruntime/core/providers/cpu/math/matmul.cc | Pass backend selector config into math::MatMul/GemmBatch/packing. |
| onnxruntime/core/providers/cpu/math/gemm_matmul_common.h | Extend GemmPackBFp32 signature to accept backend selector config. |
| onnxruntime/core/providers/cpu/math/gemm.h | Add session-option-driven config; plumb into GEMM helpers and MLFloat16 path. |
| onnxruntime/core/providers/cpu/math/gemm.cc | Forward backend selector config through packing and GEMM calls. |
| onnxruntime/core/providers/cpu/math/einsum_utils/einsum_typed_compute_processor.h | Store backend selector config pointer in processor. |
| onnxruntime/core/providers/cpu/math/einsum_utils/einsum_typed_compute_processor.cc | Pass backend selector config into Einsum MatMul execution path. |
| onnxruntime/core/providers/cpu/math/einsum_utils/einsum_auxiliary_ops.h | Extend Einsum MatMul callback signatures to accept backend selector config. |
| onnxruntime/core/providers/cpu/math/einsum_utils/einsum_auxiliary_ops.cc | Forward backend selector config into math::MatMul and device MatMul callback. |
| onnxruntime/core/providers/cpu/math/einsum.h | Add session-option-driven config to Einsum kernel. |
| onnxruntime/core/providers/cpu/math/einsum.cc | Pass backend selector config into typed compute processors. |
| onnxruntime/core/providers/cpu/llm/attention.h | Add session-option-driven config to AttentionBase. |
| onnxruntime/core/providers/cpu/llm/attention.cc | Pass backend selector config into GEMM/MatMul calls in attention. |
| onnxruntime/core/mlas/lib/softmax.h | Add session options header include. |
| onnxruntime/core/mlas/lib/sgemm.cpp | Add backend selector config parameter and gate KleidiAI overrides on it. |
| onnxruntime/core/mlas/lib/qgemm.cpp | Add backend selector config parameter for dynamic QGEMM availability/pack/batch. |
| onnxruntime/core/mlas/lib/mlasi.h | Update MLAS internal function pointer typedefs for new signatures. |
| onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp | Respect backend selector config to opt out of KleidiAI conv. |
| onnxruntime/core/mlas/lib/convolve.cpp | Forward backend selector config into GEMM calls within convolution. |
| onnxruntime/core/mlas/inc/mlas.h | Define backend selector config and extend MLAS API signatures. |
| onnxruntime/contrib_ops/cpu/word_conv_embedding.h | Add session-option-driven config and store it in kernel. |
| onnxruntime/contrib_ops/cpu/word_conv_embedding.cc | Pass backend selector config into GEMM call. |
| onnxruntime/contrib_ops/cpu/transformers/sampling_cpu_helper.h | Update SoftmaxCPU call signature (config TODO left as nullptr). |
| onnxruntime/contrib_ops/cpu/transformers/generation_device_helper.cc | Update SoftmaxCPU call signature (config TODO left as nullptr). |
| onnxruntime/contrib_ops/cpu/sparse/sparse_attention_base.h | Add session-option-driven config and pass into GEMM paths. |
| onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc | Add session-option-driven config and pass into GemmBatch. |
| onnxruntime/contrib_ops/cpu/quantization/matmul_bnb4.cc | Add session-option-driven config and pass into GemmBatch. |
| onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc | Add backend selector config usage in dynamic quant matmul path. |
| onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_lstm.cc | Pass backend selector config into pack-size call. |
| onnxruntime/contrib_ops/cpu/quantization/attention_quant.cc | Pass backend selector config into pack-size call. |
| onnxruntime/contrib_ops/cpu/moe/moe_quantization_cpu.cc | Pass backend selector config into GEMM calls. |
| onnxruntime/contrib_ops/cpu/moe/moe_cpu.cc | Pass backend selector config into MLAS GEMM calls. |
| onnxruntime/contrib_ops/cpu/moe/moe_base_cpu.h | Add session-option-driven config shared by MoE CPU base. |
| onnxruntime/contrib_ops/cpu/bert/gqa_attention_base.h | Add session-option-driven config and pass into GEMM paths. |
| onnxruntime/contrib_ops/cpu/bert/attention_cpu_base.h | Pass backend selector config into GEMM/MatMul usage. |
| onnxruntime/contrib_ops/cpu/bert/attention_base.h | Store backend selector config in base attention class. |
| onnxruntime/contrib_ops/cpu/bert/attention.cc | Pass backend selector config into pack-size/pack and GEMM paths. |
| onnxruntime/contrib_ops/cpu/attnlstm/uni_dir_attn_lstm.h | Extend constructors to accept backend selector config. |
| onnxruntime/contrib_ops/cpu/attnlstm/uni_dir_attn_lstm.cc | Store and pass backend selector config into GEMM paths. |
| onnxruntime/contrib_ops/cpu/attnlstm/deep_cpu_attn_lstm.h | Add session-option-driven config. |
| onnxruntime/contrib_ops/cpu/attnlstm/deep_cpu_attn_lstm.cc | Pass backend selector config into attention/LSTM components. |
| onnxruntime/contrib_ops/cpu/attnlstm/bahdanau_attention.h | Extend constructor to accept backend selector config. |
| onnxruntime/contrib_ops/cpu/attnlstm/bahdanau_attention.cc | Store and pass backend selector config into GEMM paths. |
| onnxruntime/contrib_ops/cpu/attnlstm/attention_wrapper.h | Extend constructor to accept backend selector config. |
| onnxruntime/contrib_ops/cpu/attnlstm/attention_wrapper.cc | Store and pass backend selector config into GEMM paths. |
| include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h | Add mlas.disable_kleidiai session option key. |
Comments suppressed due to low confidence (1)
onnxruntime/core/providers/cpu/quantization/matmul_integer_base.h:61
MlasGemmPackBSizeis now called with a backend selector config, but the subsequentMlasGemmPackB(...)call in this function still uses the old signature and doesn't pass the config. This will either fail to compile (if the signature changed) or pack B using a potentially different backend selection than the size computation, which can break correctness. Update theMlasGemmPackBcall to pass the samemlas_backend_kernel_selector_config_.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc
Outdated
Show resolved
Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…per.cc Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 108 out of 108 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
onnxruntime/contrib_ops/cpu/bert/attention_base.h:46
AttentionBasestoresmlas_backend_kernel_selector_config_and downstream code passes it into MLAS calls, but the constructor never setsuse_kleidiaibased on the newmlas.disable_kleidiaisession option. As a result, users cannot opt out of KleidiAI for this operator. Plumb the config option into this constructor (and add the neededonnxruntime_session_options_config_keys.hinclude).
MLAS_BACKEND_KERNEL_SELECTOR_CONFIG mlas_backend_kernel_selector_config_;
AttentionBase(const OpKernelInfo& info, bool require_same_hidden_size) {
int64_t num_heads = 0;
ORT_ENFORCE(info.GetAttr("num_heads", &num_heads).IsOK() && num_heads > 0);
num_heads_ = static_cast<int>(num_heads);
is_unidirectional_ = info.GetAttrOrDefault<int64_t>("unidirectional", 0) == 1;
do_rotary_ = info.GetAttrOrDefault<int64_t>("do_rotary", 0) == 1;
rotary_embedding_ = static_cast<int>(info.GetAttrOrDefault<int64_t>("rotary_embedding_dim", 0));
mask_filter_value_ = info.GetAttrOrDefault<float>("mask_filter_value", -10000.0f);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
orttraining/orttraining/training_ops/cpu/rnn/lstm_grad_compute.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 114 out of 114 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 114 out of 114 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
orttraining/orttraining/training_ops/cpu/rnn/lstm_grad_compute.h
Outdated
Show resolved
Hide resolved
|
@hariharans29 I've opened a new pull request, #27166, to work on those changes. Once the pull request is ready, I'll request review from you. |
| mlas_backend_kernel_selector_config_.use_kleidiai = | ||
| info.GetConfigOptions().GetConfigEntry(kOrtSessionOptionsMlasDisableKleidiai) != "1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a bit of boilerplate code to set up MLAS_BACKEND_KERNEL_SELECTOR_CONFIG in each kernel that needs it. as more options are added, this might become unwieldy.
would it be possible to reuse a common MLAS_BACKEND_KERNEL_SELECTOR_CONFIG instance? could we do with one per session?
or at least, the code to set up MLAS_BACKEND_KERNEL_SELECTOR_CONFIG from session options could be put into a helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will go for the latter option for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea is to have it in the base OpKernel class as a member and allow child kernels to inherit them and have the initialization done in the base OpKernel's ctor (thinking out aloud). But this would mean every OpKernel will get an instance of the config struct. In the current state, it is pretty light-weight but it could bloat up depending upon future backend additions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be prudent to stick with a common place for the struct instance to get setup from session options for now though


Description
Introduces a backend kernel selector config struct in MLAS that allows users to configure selection of backend kernels at runtime based on their preference. The immediate use-case of such a feature is to allow users to opt-out of using/selecting KleidiAI kernels should they choose to do so on ARM platforms. This solution should scale to other kernel implementation backends in the future.
Motivation and Context
Allow users to opt-out of using/selecting KleidiAI kernels should they choose to do so on ARM platforms