Skip to content

Conversation

jatinwadhwa921
Copy link

Backmerging with Msft commits

fs-eire and others added 19 commits April 4, 2025 00:24
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Fix the bug where the QNN EP generates an ONNX model with EP Context and fails to run.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
When generating an ONNX model with QNN EP context where the input is scalar, the shape is not set, resulting in a null pointer and causing the subsequent run to fail.
### Description

Fix cache key of Pad operator
…ite-default (microsoft#24312)

Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite)
from 6.2.4 to 6.2.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/releases">vite's
releases</a>.</em></p>
<blockquote>
<h2>v6.2.5</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.2.5/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/blob/v6.2.5/packages/vite/CHANGELOG.md">vite's
changelog</a>.</em></p>
<blockquote>
<h2><!-- raw HTML omitted -->6.2.5 (2025-04-03)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: backport <a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19782">#19782</a>,
fs check with svg and relative paths (<a
href="https://github.com/vitejs/vite/commit/fdb196e9f8672dba32cf5156c81665c7e82ac581">fdb196e</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19782">#19782</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/vitejs/vite/commit/c176acf70a113c33c33cb24b63ab7260e713d4b2"><code>c176acf</code></a>
release: v6.2.5</li>
<li><a
href="https://github.com/vitejs/vite/commit/fdb196e9f8672dba32cf5156c81665c7e82ac581"><code>fdb196e</code></a>
fix: backport <a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19782">#19782</a>,
fs check with svg and relative paths</li>
<li>See full diff in <a
href="https://github.com/vitejs/vite/commits/v6.2.5/packages/vite">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=vite&package-manager=npm_and_yarn&previous-version=6.2.4&new-version=6.2.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…24309)

### Description

fix the cache inconsistency of program AttentionProbs/VxAttentionScore

`n_reps` is already in uniforms so do not use it from hardcoded.
### Description
<!-- Describe your changes. -->

Essentially, the vision model is traced differently (this time it's
without mask.), and the input indices of op.Add and op.MatMul can be
different. Also, fp16 and fp32 need different tracing patterns
(op.Cast).

1. Add another traced pattern to CLIP attention to cover no
attention_mask case
2. Accept different index of input on op.Add and op.MatMul (be more
general)
3. fp16 and fp32 shows different pattern (op.Cast after op.Softmax)
4. Refactor test_fastgelu.py to cover torch.onnx.export(...,
dynamo=True)
5. Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

To optimize Gemma3 multi-modal model, the changes are needed.
https://huggingface.co/google/gemma-3-4b-it

NOTE: some related follow-ups (upstream optimizations to
onnxscript-optimizer):
microsoft/onnxscript#2158
microsoft/onnxscript#2156
### Description

Update packaging pipeline for Nodejs binding.

This change updates the pipeline to perform all Node.js binding builds,
including:
- Windows x64 ( CPU, DML, WebGPU )
- Windows arm64 ( CPU, DML, WebGPU )
- Linux x64 ( CPU, CUDA, TensorRT, WebGPU )
- Linux arm64 ( CPU )
- MacOS x64 ( CPU, CoreML, WebGPU )
- MacOS arm64 ( CPU, CoreML, WebGPU )

#### Dependencies

The Node.js binding depends on the Nuget package from the same build.

Because NPM has a size limit so we cannot fit
libonnxruntime_provider_cuda.so into it. The Node.js binding works in a
way that an installation script will try to download the Nuget package
of the corresponding version.
…oft#24239)

### Description
This change adds support for GatherBlockQuantized to use uin8_t as
data's type with the same semantics as MatMulNBits. Zero_Points and
Gather Axis other than 0 are not yet supported, in order to keep the
change scoped.

### Motivation and Context
With the newer llama models like Phi4 trained with shared embeddings,
the weights of the lm_head matrix and the embeddings table are exactly
the same. These embeddings are huge, unquantized embeddings are 1.2GB in
Phi4 mini instruct, at int4 quantization the weights are still 300MB. We
can go a step further and have these two ops the lm_head matmulnbits and
GatherBlockQuantized share the same weights, that would save 300MB on
the model size.

The two things that hinder that are the shape expectations for
GatherBlockQuantized and the data type supported for data in
GatherBlockQuantized. The shape can be solved via a simple reshape op,
but the data type needs code changes and that is what this change does.

Here is Phi4 modified with shared weights between lm_head and
matmulnbits, this model is just 2.1GB on disk.
<img width="164" alt="image"
src="https://github.com/user-attachments/assets/8bdddbb9-5b44-4839-ab48-605bee53d66b"
/>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
Add Conv, ConvTranspose, and FusedConv to the WebGPU execution provider.



### Motivation and Context
Required for operator coverage.
…etQueue (microsoft#24313)

### Description
<!-- Describe your changes. -->
This PR is one of a series of changes for optimization of Dawn API
usage. See microsoft#24281

Optimizes the usage of wgpuDeviceGetQueue.
…tion (perf test) (microsoft#24303)

In Perf Test for DirectML EP, for the "performance_preference" runtime
key we could not select the "minimum_power" value option due to a small
bug. This PR fixes it so that "minimum_power" can be used and ran. I
will also link the respective issue to this PR

I made the change, built onnxruntime, and tested the perf_test.exe +
DLLs on a system with Intel Integrated Graphics + Nvidia dGPU. Switching
between 'minimum_power' and 'high_performance', I can see the options
respectively choose Intel Integrated and Nvidia dGPU as device runtimes
respectively (I checked task manager utilization for both devices). Both
inferences complete with no problems. I am attaching a reproducer here
with the built perf_test and the commands I tried to test it:


[DLL_Build_DML_Reproducer.zip](https://github.com/user-attachments/files/19596463/DLL_Build_DML_Reproducer.zip)

Issue microsoft#24182 

@fdwr Hi, I fixed the issue, if you could please review, thank you
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This PR uses 1d disptach group size and uses workgroup_idx instead of
workgroup.x|workgroup.y in case they are normalized.
…microsoft#24315)

### Description

This PR is one of a series of changes for optimization of Dawn API
usage. See microsoft#24281

Reduce the calls to wgpuBufferAddRef and wgpuBufferRelease (part 1).
### Description
SessionOptions now have a new property - load_cancelation_flag.
This flag if set to true causes the model to abort load and
initialization for huge models.

### Motivation and Context
Some users request an ability to abandon model loading and
initialization if that exceeds certain time limits.
)

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Yulong Wang <[email protected]>
…24327)

### Description
Exclude WebGPU from Conv3D tests 



### Motivation and Context
Fix failing tests in packaging pipelines.
### Description
[VitisAI EP] export InferShapes to VitisAIEP

---------

Co-authored-by: Wang Chunye <[email protected]>
Co-authored-by: Zhenze <[email protected]>
@jatinwadhwa921 jatinwadhwa921 requested a review from ankitm3k April 8, 2025 09:48
@jatinwadhwa921 jatinwadhwa921 merged commit b999a1b into ovep-develop Apr 8, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.