Skip to content

Conversation

Jaswanth51
Copy link

Description

Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.

daijh and others added 17 commits August 12, 2025 14:22
### Description
This PR applies template to flash attention, and simplifies the
`is_unidirectional` check in shader.



### Motivation and Context
See above.
### Description

Disable two tests that were broken on X Elite by upgrading to QNN 2.37.0
### Description
This PR fixes the load_config handling logic delegating the filtering to be maintained by OV toolkit going ahead (this enables cache_dir for CPU device via load_config) & redundant upsample Op fixes.

---------

Co-authored-by: jatinwadhwa921 <[email protected]>
…rosoft#25702)

### Description
Enhance unique name generator for node and tensor names

### Motivation and Context
QNN requires node name to be unique. We've seen many instance of QNN node name conflicts results in failures on QNN graph finalizations.
However, currently it's hard-coded and thus error-prone, this change adds utility to generate unique names used in QNN nodes and intermediate
I/O tensors.
…icrosoft#25706)

### Description
<!-- Describe your changes. -->

Fix swapped value and count arguments to `std::vector` constructor.

The `std::vector` constructor signature is:
`vector( size_type count, const T& value, const Allocator& alloc =
Allocator() );`

https://en.cppreference.com/w/cpp/container/vector/vector.html

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix issue discovered after enabling warning.

```
Error: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(688,34): error C2220: the following warning is treated as an error [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_provider_test.vcxproj]
Warning: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\tensorrt\tensorrt_basic_test.cc(688,34): warning C4244: 'argument': conversion from 'float' to 'const unsigned __int64', possible loss of data [E:\_work\_temp\build\RelWithDebInfo\onnxruntime_provider_test.vcxproj]
```
### Description

Upgrade wgsl-template to 0.1.14.

Includes the following changes:
- show original file/line if different
- allow duplicated params
- [bugfix] show source lines correctly for generation errors
### Description
Fixes microsoft#25710 for bugs: Unused parameter ‘node_domain’, ‘node_op_type’
and ‘target_data_layout’.



### Motivation and Context
microsoft#25710

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Add alpha/beta and int/f16 GEMM test and fix some error for GEMM shader.
### Description

This PR introduces precompiled header (PCH) support for ONNX Runtime
targets that exhibited the longest build times when built with the MSVC
toolset. By analyzing build performance, I identified a subset of
targets with significant compilation overhead due to repeated header
processing. Enabling PCH for these targets reduces redundant parsing,
improving incremental and full build performance.

Changes include:

Added PCH configuration to selected CMake targets with the highest build
cost in MSVC builds.

Ensured PCH setup is compatible with the existing build configurations.

Verified successful compilation and linkage with PCH enabled under MSVC.

Impact:

~30% reduction in build time
…crosoft#25673)

### Description
Relax WeightBiasQuantization constraint for larger QDQ node group

### Motivation and Context
The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`).

It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.
…soft#25752)

### Description
<!-- Describe your changes. -->

Update mac.yml iphone_simulator job to use Xcode version 16.4.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix CI build failure.

Following the recommendation here:
actions/runner-images#12758 (comment)
…microsoft#25730)

It seems that when multiple threads in one subgroup access the same
shared memory location, the performance is poor on Qualcomm devices
(bank conflicts?). If we limit the number of threads accessing the same
memory location, the performance is greatly improved on Qualcomm
devices.

Phi4 becomes ~10s from ~13s on QC Adreno X1-85 (31.0.112.0).
### Description
<!-- Describe your changes. -->
The clearing of shared_allocators_ invalidates all entries in
shared_ort_allocators_.

Remove unused shared_arena_allocators_. That became unnecessary by
providing EPs an example implementation for an OrtAllocator based
stream-aware arena that they can use directly.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix access violation (swallowed as it happens during shutdown) in dtor.
### Description
Moves DP4A shaders into templates

### Motivation and Context
Preparation for upcoming changes to add 2 bit quantization and MOE.
Moving to templates will improve code readability.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@Jaswanth51 Jaswanth51 requested a review from ankitm3k August 18, 2025 01:51
@ankitm3k ankitm3k merged commit 78e46e2 into ovep-develop Aug 18, 2025
6 of 8 checks passed
@ankitm3k ankitm3k deleted the sync_msft_18082025 branch August 18, 2025 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.