Backmerging with Msft commits #643

jatinwadhwa921 · 2025-04-08T09:48:34Z

Backmerging with Msft commits

### Description  ### Motivation and Context

### Description  Fix the bug where the QNN EP generates an ONNX model with EP Context and fails to run. ### Motivation and Context  When generating an ONNX model with QNN EP context where the input is scalar, the shape is not set, resulting in a null pointer and causing the subsequent run to fail.

### Description Fix cache key of Pad operator

…ite-default (microsoft#24312) Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.2.4 to 6.2.5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/releases">vite's releases</a>.</em></p> <blockquote> <h2>v6.2.5</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/v6.2.5/packages/vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/blob/v6.2.5/packages/vite/CHANGELOG.md">vite's changelog</a>.</em></p> <blockquote> <h2>6.2.5 (2025-04-03)</h2> <ul> <li>fix: backport <a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19782">#19782</a>, fs check with svg and relative paths (<a href="https://github.com/vitejs/vite/commit/fdb196e9f8672dba32cf5156c81665c7e82ac581">fdb196e</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19782">#19782</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vitejs/vite/commit/c176acf70a113c33c33cb24b63ab7260e713d4b2"><code>c176acf</code></a> release: v6.2.5</li> <li><a href="https://github.com/vitejs/vite/commit/fdb196e9f8672dba32cf5156c81665c7e82ac581"><code>fdb196e</code></a> fix: backport <a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19782">#19782</a>, fs check with svg and relative paths</li> <li>See full diff in <a href="https://github.com/vitejs/vite/commits/v6.2.5/packages/vite">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=vite&package-manager=npm_and_yarn&previous-version=6.2.4&new-version=6.2.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…24309) ### Description fix the cache inconsistency of program AttentionProbs/VxAttentionScore `n_reps` is already in uniforms so do not use it from hardcoded.

### Description  Essentially, the vision model is traced differently (this time it's without mask.), and the input indices of op.Add and op.MatMul can be different. Also, fp16 and fp32 need different tracing patterns (op.Cast). 1. Add another traced pattern to CLIP attention to cover no attention_mask case 2. Accept different index of input on op.Add and op.MatMul (be more general) 3. fp16 and fp32 shows different pattern (op.Cast after op.Softmax) 4. Refactor test_fastgelu.py to cover torch.onnx.export(..., dynamo=True) 5. Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32 ### Motivation and Context  To optimize Gemma3 multi-modal model, the changes are needed. https://huggingface.co/google/gemma-3-4b-it NOTE: some related follow-ups (upstream optimizations to onnxscript-optimizer): microsoft/onnxscript#2158 microsoft/onnxscript#2156

### Description Update packaging pipeline for Nodejs binding. This change updates the pipeline to perform all Node.js binding builds, including: - Windows x64 ( CPU, DML, WebGPU ) - Windows arm64 ( CPU, DML, WebGPU ) - Linux x64 ( CPU, CUDA, TensorRT, WebGPU ) - Linux arm64 ( CPU ) - MacOS x64 ( CPU, CoreML, WebGPU ) - MacOS arm64 ( CPU, CoreML, WebGPU ) #### Dependencies The Node.js binding depends on the Nuget package from the same build. Because NPM has a size limit so we cannot fit libonnxruntime_provider_cuda.so into it. The Node.js binding works in a way that an installation script will try to download the Nuget package of the corresponding version.

…oft#24239) ### Description This change adds support for GatherBlockQuantized to use uin8_t as data's type with the same semantics as MatMulNBits. Zero_Points and Gather Axis other than 0 are not yet supported, in order to keep the change scoped. ### Motivation and Context With the newer llama models like Phi4 trained with shared embeddings, the weights of the lm_head matrix and the embeddings table are exactly the same. These embeddings are huge, unquantized embeddings are 1.2GB in Phi4 mini instruct, at int4 quantization the weights are still 300MB. We can go a step further and have these two ops the lm_head matmulnbits and GatherBlockQuantized share the same weights, that would save 300MB on the model size. The two things that hinder that are the shape expectations for GatherBlockQuantized and the data type supported for data in GatherBlockQuantized. The shape can be solved via a simple reshape op, but the data type needs code changes and that is what this change does. Here is Phi4 modified with shared weights between lm_head and matmulnbits, this model is just 2.1GB on disk. <img width="164" alt="image" src="https://github.com/user-attachments/assets/8bdddbb9-5b44-4839-ab48-605bee53d66b" /> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

### Description Add Conv, ConvTranspose, and FusedConv to the WebGPU execution provider. ### Motivation and Context Required for operator coverage.

…etQueue (microsoft#24313) ### Description  This PR is one of a series of changes for optimization of Dawn API usage. See microsoft#24281 Optimizes the usage of wgpuDeviceGetQueue.

@fdwr

…tion (perf test) (microsoft#24303) In Perf Test for DirectML EP, for the "performance_preference" runtime key we could not select the "minimum_power" value option due to a small bug. This PR fixes it so that "minimum_power" can be used and ran. I will also link the respective issue to this PR I made the change, built onnxruntime, and tested the perf_test.exe + DLLs on a system with Intel Integrated Graphics + Nvidia dGPU. Switching between 'minimum_power' and 'high_performance', I can see the options respectively choose Intel Integrated and Nvidia dGPU as device runtimes respectively (I checked task manager utilization for both devices). Both inferences complete with no problems. I am attaching a reproducer here with the built perf_test and the commands I tried to test it: [DLL_Build_DML_Reproducer.zip](https://github.com/user-attachments/files/19596463/DLL_Build_DML_Reproducer.zip) Issue microsoft#24182 @fdwr Hi, I fixed the issue, if you could please review, thank you

### Description  ### Motivation and Context

This PR uses 1d disptach group size and uses workgroup_idx instead of workgroup.x|workgroup.y in case they are normalized.

…microsoft#24315) ### Description This PR is one of a series of changes for optimization of Dawn API usage. See microsoft#24281 Reduce the calls to wgpuBufferAddRef and wgpuBufferRelease (part 1).

### Description SessionOptions now have a new property - load_cancelation_flag. This flag if set to true causes the model to abort load and initialization for huge models. ### Motivation and Context Some users request an ability to abandon model loading and initialization if that exceeds certain time limits.

) ### Description  ### Motivation and Context  Co-authored-by: Yulong Wang <[email protected]>

…24327) ### Description Exclude WebGPU from Conv3D tests ### Motivation and Context Fix failing tests in packaging pipelines.

### Description [VitisAI EP] export InferShapes to VitisAIEP --------- Co-authored-by: Wang Chunye <[email protected]> Co-authored-by: Zhenze <[email protected]>

fs-eire and others added 19 commits April 4, 2025 00:24

upgrade action shellcheck to v1.30.0 (microsoft#24304)

82c8e56

### Description  ### Motivation and Context

[WebGPU] fix Pad cache key (microsoft#24305)

318cc87

### Description Fix cache key of Pad operator

[WebGPU] fix cache key of AttentionProbs/VxAttentionScore (microsoft#…

2e94c5a

…24309) ### Description fix the cache inconsistency of program AttentionProbs/VxAttentionScore `n_reps` is already in uniforms so do not use it from hardcoded.

[Native WebGPU] Add Conv, ConTranspose and FusedConv (microsoft#24186)

9102aae

### Description Add Conv, ConvTranspose, and FusedConv to the WebGPU execution provider. ### Motivation and Context Required for operator coverage.

[webgpu][dawn API optimization] reduce number of calls to wgpuDeviceG…

a7e62d6

…etQueue (microsoft#24313) ### Description  This PR is one of a series of changes for optimization of Dawn API usage. See microsoft#24281 Optimizes the usage of wgpuDeviceGetQueue.

Add ConvTranspose cache key (microsoft#24317)

d6df4f2

### Description  ### Motivation and Context

[webgpu] Use 1D dispatch groups for attention (microsoft#24228)

a1186f6

This PR uses 1d disptach group size and uses workgroup_idx instead of workgroup.x|workgroup.y in case they are normalized.

[webgpu][dawn API optimization] reduce number of calls to buffer APIs (…

73676fc

…microsoft#24315) ### Description This PR is one of a series of changes for optimization of Dawn API usage. See microsoft#24281 Reduce the calls to wgpuBufferAddRef and wgpuBufferRelease (part 1).

[Native WebGPU] Exclude WebGPU EP from ConvFp16 3D tests. (microsoft#…

b803429

…24327) ### Description Exclude WebGPU from Conv3D tests ### Motivation and Context Fix failing tests in packaging pipelines.

[VitisAI EP] export InferShapes to VitisAIEP (microsoft#23881)

554fb4a

### Description [VitisAI EP] export InferShapes to VitisAIEP --------- Co-authored-by: Wang Chunye <[email protected]> Co-authored-by: Zhenze <[email protected]>

Merge branch 'master' into syncing_msft_8_4_25

8517c64

jatinwadhwa921 requested a review from ankitm3k April 8, 2025 09:48

ankitm3k approved these changes Apr 8, 2025

View reviewed changes

jatinwadhwa921 merged commit b999a1b into ovep-develop Apr 8, 2025
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backmerging with Msft commits #643

Backmerging with Msft commits #643

Uh oh!

jatinwadhwa921 commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Backmerging with Msft commits #643

Backmerging with Msft commits #643

Uh oh!

Conversation

jatinwadhwa921 commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants