Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8c3c07e
Add type definitions, registration, utilities for INT2/UINT2 support …
vraspar Jan 15, 2026
fc8e803
[QNN EP] Add BFloat16 dtype support in QNN EP (#26987)
tirupath-qti Jan 15, 2026
e1355db
Implement new experimental lookup-based matrix multiplication method(…
vraspar Jan 15, 2026
ad48e89
[MLAS/CPU EP] Improve performance of Silu activation path within the …
hariharans29 Jan 15, 2026
843d519
[QNN EP] Add support for handling 0-dimension for Concat Op (#27000)
qti-ashwshan Jan 15, 2026
dc16751
Fix ClipQuantFusion crash when Clip has multiple input edges (#27016)
edgchen1 Jan 16, 2026
bb8a44a
[QNN EP] Support quantized BatchNorm with per-channel DQ params on QN…
qti-yuduo Jan 16, 2026
f303d8e
Add API to get ep graph partitioning info (#26781)
adrianlizarraga Jan 16, 2026
55bc4f7
[OVEP] OpenVINO EP Features and bug-fixes for ORT-1.24 - Follow up (#…
preetha-intel Jan 16, 2026
4cea074
[QNN-EP] Add MatMulNBits translation for GPU (#26340)
quic-tirupath Jan 16, 2026
4e2c62b
[MLAS/NEON] Add dedicated kernel for depthwise convolution for ARM64 …
hariharans29 Jan 16, 2026
fb53090
[QNN-EP] Support alternate Layernorm fusion pattern in QNN preprocess…
qti-mattsinc Jan 16, 2026
a53cad7
Implement multithreading in qgemm_kleidi (#26301)
melkap01-Arm Jan 16, 2026
796f711
[CXX] Enable users to specify custom OrtSyncStream via RunOptions (#2…
yuslepukhin Jan 17, 2026
b048ae8
Added support for QMX kernels in MLAS (#26849)
qti-vaiskv Jan 17, 2026
aa76598
Tweak external resource importer handle structs (#27040)
skottmckay Jan 17, 2026
bf28643
[QNN EP] Add QuickGELU operator support for QNN provider (#27034)
tirupath-qti Jan 17, 2026
9e3066a
Add INT2 and UINT2 support for QDQ, transpose and cast ops (#27022)
vraspar Jan 17, 2026
2433bba
Introducing BF16 Pointwise NCHWc Convolution for Arm64 (#26838)
Rohanjames1997 Jan 18, 2026
b0382cf
[EP ABI] Add CreateCustomOpDomains() API for plugin EP to register cu…
chilo-ms Jan 19, 2026
c29baeb
Add a new pipeline for CUDA 13 nuget builds (#27023)
eserscor Jan 20, 2026
4f02d8c
[EP ABI] Update Graph_GetGraphView() implementation (#26711)
chilo-ms Jan 20, 2026
288e177
[webgpu] Fix a bug for im2col (#27069)
wenqinI Jan 20, 2026
a105e81
[QNN EP] Add FusedMatMul operator support (#27044)
tirupath-qti Jan 20, 2026
60bd5f0
Disable Float32_2Bits_Asymmetric_256x256 test (#27046)
vraspar Jan 21, 2026
3e90277
Fix Doxygen documentation build error in onnxruntime_c_api.h (#27083)
nieubank Jan 21, 2026
dac7ecc
Print tensor for new packed type of 2 bits (#27064)
tianleiwu Jan 21, 2026
33d872e
Fix GPU JAR testing on Linux (#27011)
eserscor Jan 21, 2026
3fd6875
Fix warning around ununsed code in QNN Android Emulator builds by cla…
hariharans29 Jan 16, 2026
ffb5437
Raise the timeout for the ios simulator job (#27045)
hariharans29 Jan 17, 2026
44d4421
upgrade emsdk to 4.0.23 (#27029)
fs-eire Jan 19, 2026
7555efb
Add dedicated API to support extracting compatibility string from mod…
adrastogi Jan 21, 2026
f1eb3b0
Fix failing mainline build on Arm64 linux (#27101)
Rohanjames1997 Jan 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/mac.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:
matrix:
target_arch: [x86_64, arm64]

timeout-minutes: 90
timeout-minutes: 120

steps:
- name: Checkout code
Expand Down
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
[submodule "cmake/external/emsdk"]
path = cmake/external/emsdk
url = https://github.com/emscripten-core/emsdk.git
branch = 4.0.21
branch = 4.0.23
1 change: 1 addition & 0 deletions cmake/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ option(onnxruntime_USE_SVE "Build with SVE support in MLAS" OFF)
option(onnxruntime_USE_ARM_NEON_NCHWC "Build with ARM Neon NCHWc kernels in MLAS" OFF)

option(onnxruntime_USE_KLEIDIAI "Build with KleidiAI integration in MLAS" OFF)
option(onnxruntime_USE_QMX_KLEIDIAI_COEXIST "Build with QMX and Arm KLEIDIAI libraries" OFF)
option(onnxruntime_BUILD_UNIT_TESTS "Build ONNXRuntime unit tests" ON)
option(onnxruntime_BUILD_CSHARP "Build C# library" OFF)
option(onnxruntime_BUILD_OBJC "Build Objective-C library" OFF)
Expand Down
5 changes: 4 additions & 1 deletion cmake/deps.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,5 +56,8 @@ extensions;https://github.com/microsoft/onnxruntime-extensions/archive/c24b7bab0
directx_headers;https://github.com/microsoft/DirectX-Headers/archive/refs/tags/v1.613.1.zip;47653509a3371eabb156360f42faf582f314bf2e
cudnn_frontend;https://github.com/NVIDIA/cudnn-frontend/archive/refs/tags/v1.12.0.zip;7e733cfdc410d777b76122d64232499205589a96
dawn;https://github.com/google/dawn/archive/13c1635a14574ebb7116b56a69f5519301417fda.zip;0aadd28fc385cf7d657d5fc70a352372d2d3c76a
kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.15.0.tar.gz;62ccd24ab60bcef68766440fb42d79071ac2a5d2
kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.20.0.tar.gz;6895e72b3d5cf1173358164cb3d64c9d7d33cc84
# kleidiai-qmx is pinned to a specific commit as there are no tagged releases. When an appropriate tagged release becomes available,
# this entry will be updated to use refs/tags/<version> instead of the raw commit hash.
kleidiai-qmx;https://github.com/qualcomm/kleidiai/archive/2f10c9a8d32f81ffeeb6d4885a29cc35d2b0da87.zip;5e855730a2d69057a569f43dd7532db3b2d2a05c
duktape;https://github.com/svaarala/duktape/releases/download/v2.7.0/duktape-2.7.0.tar.xz;8200c8e417dbab7adcc12c4dbdef7651cfc55794
6 changes: 6 additions & 0 deletions cmake/external/onnxruntime_external_deps.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -845,6 +845,12 @@ if(onnxruntime_USE_KLEIDIAI)

onnxruntime_fetchcontent_declare(kleidiai URL ${DEP_URL_kleidiai} URL_HASH SHA1=${DEP_SHA1_kleidiai} EXCLUDE_FROM_ALL)
onnxruntime_fetchcontent_makeavailable(kleidiai)
# Fetch Qualcomm's kleidiai library
if(onnxruntime_USE_QMX_KLEIDIAI_COEXIST)
onnxruntime_fetchcontent_declare(kleidiai-qmx URL ${DEP_URL_kleidiai-qmx} URL_HASH SHA1=${DEP_SHA1_kleidiai-qmx}
EXCLUDE_FROM_ALL)
onnxruntime_fetchcontent_makeavailable(kleidiai-qmx)
endif()
endif()

set(onnxruntime_LINK_DIRS)
Expand Down
29 changes: 26 additions & 3 deletions cmake/onnxruntime_mlas.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ onnxruntime_add_static_library(onnxruntime_mlas
${MLAS_SRC_DIR}/qdwconv_kernelsize.cpp
${MLAS_SRC_DIR}/qnbitgemm.h
${MLAS_SRC_DIR}/qnbitgemm.cpp
${MLAS_SRC_DIR}/qlutgemm.h
${MLAS_SRC_DIR}/qlutgemm.cpp
${MLAS_SRC_DIR}/sqnbitgemm_q8_block.h
${MLAS_SRC_DIR}/flashattn.cpp
${MLAS_SRC_DIR}/cast.cpp
Expand Down Expand Up @@ -113,6 +115,7 @@ function(setup_mlas_source_for_windows)
${MLAS_SRC_DIR}/eltwise_kernel_neon.cpp
${MLAS_SRC_DIR}/eltwise_kernel_neon_fp16.cpp
${MLAS_SRC_DIR}/sqnbitgemm_kernel_neon_int8_i8mm.cpp
${MLAS_SRC_DIR}/sconv_nchw_kernel_neon.cpp
)

set(mlas_platform_preprocess_srcs
Expand Down Expand Up @@ -209,6 +212,8 @@ function(setup_mlas_source_for_windows)
${MLAS_SRC_DIR}/qgemm_kernel_sse.cpp
${MLAS_SRC_DIR}/qgemm_kernel_sse41.cpp
${MLAS_SRC_DIR}/intrinsics/avx512/quantize_avx512f.cpp
${MLAS_SRC_DIR}/sqnbitgemm_lut_kernel_avx2.h
${MLAS_SRC_DIR}/sqnbitgemm_lut_kernel_avx2.cpp
${MLAS_SRC_DIR}/sqnbitgemm_kernel_avx2.cpp
${MLAS_SRC_DIR}/sqnbitgemm_kernel_avx512.cpp
${MLAS_SRC_DIR}/sqnbitgemm_kernel_avx512vnni.cpp
Expand Down Expand Up @@ -284,6 +289,11 @@ function(setup_kleidiai)
)
target_link_libraries(onnxruntime_mlas PRIVATE kleidiai)
list(APPEND onnxruntime_EXTERNAL_LIBRARIES kleidiai)
if(onnxruntime_USE_QMX_KLEIDIAI_COEXIST)
target_link_libraries(onnxruntime_mlas PRIVATE kleidiai-qmx)
target_compile_definitions(onnxruntime_mlas PRIVATE ENABLE_QMX_KERNELS=1)
list(APPEND onnxruntime_EXTERNAL_LIBRARIES kleidiai-qmx)
endif()
set(onnxruntime_EXTERNAL_LIBRARIES ${onnxruntime_EXTERNAL_LIBRARIES} PARENT_SCOPE)

# If KLEIDIAI_DEBUG is enabled that implies both DEBUG and KERNEL messages.
Expand All @@ -302,13 +312,21 @@ function(setup_kleidiai)
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
FRAMEWORK DESTINATION ${CMAKE_INSTALL_BINDIR})
endif()

if(onnxruntime_USE_QMX_KLEIDIAI_COEXIST)
install(TARGETS kleidiai-qmx EXPORT ${PROJECT_NAME}Targets
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
FRAMEWORK DESTINATION ${CMAKE_INSTALL_BINDIR})
endif()
endfunction()

function (setup_arm_neon_nchwc)
target_sources(onnxruntime_mlas PRIVATE
${MLAS_SRC_DIR}/sconv.h
${MLAS_SRC_DIR}/sconv_kernel_neon.cpp
${MLAS_SRC_DIR}/spool_kernel_neon.cpp
${MLAS_SRC_DIR}/sconv_nchwc_kernel_neon.h
${MLAS_SRC_DIR}/sconv_nchwc_kernel_neon.cpp
${MLAS_SRC_DIR}/spool_nchwc_kernel_neon.cpp
)
list(APPEND mlas_private_compile_definitions MLAS_USE_ARM_NEON_NCHWC)
set(mlas_private_compile_definitions ${mlas_private_compile_definitions} PARENT_SCOPE)
Expand Down Expand Up @@ -460,6 +478,7 @@ else()
${MLAS_SRC_DIR}/eltwise_kernel_neon.h
${MLAS_SRC_DIR}/eltwise_kernel_neon.cpp
${MLAS_SRC_DIR}/sqnbitgemm_kernel_neon_int8_i8mm.cpp
${MLAS_SRC_DIR}/sconv_nchw_kernel_neon.cpp
)

# Conditionally add the SVE implementation if compiler supports it
Expand Down Expand Up @@ -496,6 +515,7 @@ else()
${MLAS_SRC_DIR}/qgemm_kernel_smmla.cpp
${MLAS_SRC_DIR}/qgemm_kernel_ummla.cpp
${MLAS_SRC_DIR}/sbgemm_kernel_neon.cpp
${MLAS_SRC_DIR}/sbconv_kernel_neon.cpp
${MLAS_SRC_DIR}/cast_kernel_neon.cpp
${MLAS_SRC_DIR}/hqnbitgemm_kernel_neon_fp16.cpp
${MLAS_SRC_DIR}/rotary_embedding_kernel_neon_fp16.cpp
Expand All @@ -511,6 +531,7 @@ else()
set_source_files_properties(${MLAS_SRC_DIR}/dwconv.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
set_source_files_properties(${MLAS_SRC_DIR}/pooling_fp16.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
set_source_files_properties(${MLAS_SRC_DIR}/sbgemm_kernel_neon.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+bf16 ")
set_source_files_properties(${MLAS_SRC_DIR}/sbconv_kernel_neon.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+bf16 ")
set_source_files_properties(${MLAS_SRC_DIR}/cast_kernel_neon.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
set_source_files_properties(${MLAS_SRC_DIR}/hqnbitgemm_kernel_neon_fp16.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
set_source_files_properties(${MLAS_SRC_DIR}/rotary_embedding_kernel_neon_fp16.cpp PROPERTIES COMPILE_FLAGS " -march=armv8.2-a+fp16 ")
Expand Down Expand Up @@ -693,6 +714,8 @@ else()
${MLAS_SRC_DIR}/intrinsics/avx2/qdwconv_avx2.cpp
${MLAS_SRC_DIR}/intrinsics/avx2/saturation_check_avx2.cpp
${MLAS_SRC_DIR}/sqnbitgemm_kernel_avx2.cpp
${MLAS_SRC_DIR}/sqnbitgemm_lut_kernel_avx2.h
${MLAS_SRC_DIR}/sqnbitgemm_lut_kernel_avx2.cpp
${MLAS_SRC_DIR}/rotary_embedding_kernel_avx2.h
${MLAS_SRC_DIR}/rotary_embedding_kernel_avx2.cpp
${MLAS_SRC_DIR}/rotary_embedding_kernel_avx2.cpp
Expand Down
Loading
Loading