forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 53
Sync with Microsoft ONNX Runtime - [18/08/2025] #780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Description This PR applies template to flash attention, and simplifies the `is_unidirectional` check in shader. ### Motivation and Context See above.
### Description Disable two tests that were broken on X Elite by upgrading to QNN 2.37.0
### Description This PR fixes the load_config handling logic delegating the filtering to be maintained by OV toolkit going ahead (this enables cache_dir for CPU device via load_config) & redundant upsample Op fixes. --------- Co-authored-by: jatinwadhwa921 <[email protected]>
…rosoft#25702) ### Description Enhance unique name generator for node and tensor names ### Motivation and Context QNN requires node name to be unique. We've seen many instance of QNN node name conflicts results in failures on QNN graph finalizations. However, currently it's hard-coded and thus error-prone, this change adds utility to generate unique names used in QNN nodes and intermediate I/O tensors.
…icrosoft#25706) ### Description <!-- Describe your changes. --> Fix swapped value and count arguments to `std::vector` constructor. The `std::vector` constructor signature is: `vector( size_type count, const T& value, const Allocator& alloc = Allocator() );` https://en.cppreference.com/w/cpp/container/vector/vector.html ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix issue discovered after enabling warning. ``` Error: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(688,34): error C2220: the following warning is treated as an error [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_provider_test.vcxproj] Warning: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(688,34): warning C4244: 'argument': conversion from 'float' to 'const unsigned __int64', possible loss of data [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_provider_test.vcxproj] ```
### Description Upgrade wgsl-template to 0.1.14. Includes the following changes: - show original file/line if different - allow duplicated params - [bugfix] show source lines correctly for generation errors
### Description Fixes microsoft#25710 for bugs: Unused parameter ‘node_domain’, ‘node_op_type’ and ‘target_data_layout’. ### Motivation and Context microsoft#25710 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Add alpha/beta and int/f16 GEMM test and fix some error for GEMM shader.
Follow up for microsoft#25702. Improve thread safety.
### Description This PR introduces precompiled header (PCH) support for ONNX Runtime targets that exhibited the longest build times when built with the MSVC toolset. By analyzing build performance, I identified a subset of targets with significant compilation overhead due to repeated header processing. Enabling PCH for these targets reduces redundant parsing, improving incremental and full build performance. Changes include: Added PCH configuration to selected CMake targets with the highest build cost in MSVC builds. Ensured PCH setup is compatible with the existing build configurations. Verified successful compilation and linkage with PCH enabled under MSVC. Impact: ~30% reduction in build time
…crosoft#25673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.
…soft#25752) ### Description <!-- Describe your changes. --> Update mac.yml iphone_simulator job to use Xcode version 16.4. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI build failure. Following the recommendation here: actions/runner-images#12758 (comment)
…microsoft#25730) It seems that when multiple threads in one subgroup access the same shared memory location, the performance is poor on Qualcomm devices (bank conflicts?). If we limit the number of threads accessing the same memory location, the performance is greatly improved on Qualcomm devices. Phi4 becomes ~10s from ~13s on QC Adreno X1-85 (31.0.112.0).
### Description <!-- Describe your changes. --> The clearing of shared_allocators_ invalidates all entries in shared_ort_allocators_. Remove unused shared_arena_allocators_. That became unnecessary by providing EPs an example implementation for an OrtAllocator based stream-aware arena that they can use directly. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix access violation (swallowed as it happens during shutdown) in dtor.
### Description Moves DP4A shaders into templates ### Motivation and Context Preparation for upcoming changes to add 2 bit quantization and MOE. Moving to templates will improve code readability. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
ankitm3k
approved these changes
Aug 18, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.