Sync with Microsoft ONNX Runtime - [18/08/2025] #780

Jaswanth51 · 2025-08-18T01:51:51Z

Description

Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.

### Description This PR applies template to flash attention, and simplifies the `is_unidirectional` check in shader. ### Motivation and Context See above.

### Description Disable two tests that were broken on X Elite by upgrading to QNN 2.37.0

### Description This PR fixes the load_config handling logic delegating the filtering to be maintained by OV toolkit going ahead (this enables cache_dir for CPU device via load_config) & redundant upsample Op fixes. --------- Co-authored-by: jatinwadhwa921 <[email protected]>

…rosoft#25702) ### Description Enhance unique name generator for node and tensor names ### Motivation and Context QNN requires node name to be unique. We've seen many instance of QNN node name conflicts results in failures on QNN graph finalizations. However, currently it's hard-coded and thus error-prone, this change adds utility to generate unique names used in QNN nodes and intermediate I/O tensors.

…icrosoft#25706) ### Description  Fix swapped value and count arguments to `std::vector` constructor. The `std::vector` constructor signature is: `vector( size_type count, const T& value, const Allocator& alloc = Allocator() );` https://en.cppreference.com/w/cpp/container/vector/vector.html ### Motivation and Context  Fix issue discovered after enabling warning. ``` Error: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(688,34): error C2220: the following warning is treated as an error [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_provider_test.vcxproj] Warning: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(688,34): warning C4244: 'argument': conversion from 'float' to 'const unsigned __int64', possible loss of data [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_provider_test.vcxproj] ```

### Description Upgrade wgsl-template to 0.1.14. Includes the following changes: - show original file/line if different - allow duplicated params - [bugfix] show source lines correctly for generation errors

### Description Fixes microsoft#25710 for bugs: Unused parameter ‘node_domain’, ‘node_op_type’ and ‘target_data_layout’. ### Motivation and Context microsoft#25710 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Add alpha/beta and int/f16 GEMM test and fix some error for GEMM shader.

Follow up for microsoft#25702. Improve thread safety.

### Description This PR introduces precompiled header (PCH) support for ONNX Runtime targets that exhibited the longest build times when built with the MSVC toolset. By analyzing build performance, I identified a subset of targets with significant compilation overhead due to repeated header processing. Enabling PCH for these targets reduces redundant parsing, improving incremental and full build performance. Changes include: Added PCH configuration to selected CMake targets with the highest build cost in MSVC builds. Ensured PCH setup is compatible with the existing build configurations. Verified successful compilation and linkage with PCH enabled under MSVC. Impact: ~30% reduction in build time

…crosoft#25673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.

…soft#25752) ### Description  Update mac.yml iphone_simulator job to use Xcode version 16.4. ### Motivation and Context  Fix CI build failure. Following the recommendation here: actions/runner-images#12758 (comment)

…microsoft#25730) It seems that when multiple threads in one subgroup access the same shared memory location, the performance is poor on Qualcomm devices (bank conflicts?). If we limit the number of threads accessing the same memory location, the performance is greatly improved on Qualcomm devices. Phi4 becomes ~10s from ~13s on QC Adreno X1-85 (31.0.112.0).

### Description  The clearing of shared_allocators_ invalidates all entries in shared_ort_allocators_. Remove unused shared_arena_allocators_. That became unnecessary by providing EPs an example implementation for an OrtAllocator based stream-aware arena that they can use directly. ### Motivation and Context  Fix access violation (swallowed as it happens during shutdown) in dtor.

### Description Moves DP4A shaders into templates ### Motivation and Context Preparation for upcoming changes to add 2 bit quantization and MOE. Moving to templates will improve code readability. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

daijh and others added 17 commits August 12, 2025 14:22

[webgpu] Apply template to flash attention (microsoft#25722)

0ccc9b0

### Description This PR applies template to flash attention, and simplifies the `is_unidirectional` check in shader. ### Motivation and Context See above.

[QNN EP] Disable tests broken by QNN 2.37 (microsoft#25729)

03301cc

### Description Disable two tests that were broken on X Elite by upgrading to QNN 2.37.0

Bump actions/download-artifact from 4 to 5 (microsoft#25712)

1b75411

[WebGPU] upgrade wgsl-template to 0.1.14 (microsoft#25731)

cab752b

### Description Upgrade wgsl-template to 0.1.14. Includes the following changes: - show original file/line if different - allow duplicated params - [bugfix] show source lines correctly for generation errors

[webgpu] Add more GEMM test (microsoft#25556)

3a94a67

Add alpha/beta and int/f16 GEMM test and fix some error for GEMM shader.

[QNN EP] Thread safe unique name generator (microsoft#25738)

b8821e9

Follow up for microsoft#25702. Improve thread safety.

Merge branch 'master' into sync_msft_18082025

f05d669

Jaswanth51 requested a review from ankitm3k August 18, 2025 01:51

ankitm3k approved these changes Aug 18, 2025

View reviewed changes

ankitm3k merged commit 78e46e2 into ovep-develop Aug 18, 2025
6 of 8 checks passed

ankitm3k deleted the sync_msft_18082025 branch August 18, 2025 05:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync with Microsoft ONNX Runtime - [18/08/2025] #780

Sync with Microsoft ONNX Runtime - [18/08/2025] #780

Uh oh!

Jaswanth51 commented Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

Sync with Microsoft ONNX Runtime - [18/08/2025] #780

Sync with Microsoft ONNX Runtime - [18/08/2025] #780

Uh oh!

Conversation

Jaswanth51 commented Aug 18, 2025

Description

Uh oh!

Uh oh!

Uh oh!