forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 53
Backmerging with Msft commits #643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description <!-- Describe your changes. --> Fix the bug where the QNN EP generates an ONNX model with EP Context and fails to run. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> When generating an ONNX model with QNN EP context where the input is scalar, the shape is not set, resulting in a null pointer and causing the subsequent run to fail.
### Description Fix cache key of Pad operator
…ite-default (microsoft#24312) Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.2.4 to 6.2.5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/releases">vite's releases</a>.</em></p> <blockquote> <h2>v6.2.5</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/v6.2.5/packages/vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/blob/v6.2.5/packages/vite/CHANGELOG.md">vite's changelog</a>.</em></p> <blockquote> <h2><!-- raw HTML omitted -->6.2.5 (2025-04-03)<!-- raw HTML omitted --></h2> <ul> <li>fix: backport <a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19782">#19782</a>, fs check with svg and relative paths (<a href="https://github.com/vitejs/vite/commit/fdb196e9f8672dba32cf5156c81665c7e82ac581">fdb196e</a>), closes <a href="https://redirect.github.com/vitejs/vite/issues/19782">#19782</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vitejs/vite/commit/c176acf70a113c33c33cb24b63ab7260e713d4b2"><code>c176acf</code></a> release: v6.2.5</li> <li><a href="https://github.com/vitejs/vite/commit/fdb196e9f8672dba32cf5156c81665c7e82ac581"><code>fdb196e</code></a> fix: backport <a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19782">#19782</a>, fs check with svg and relative paths</li> <li>See full diff in <a href="https://github.com/vitejs/vite/commits/v6.2.5/packages/vite">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…24309) ### Description fix the cache inconsistency of program AttentionProbs/VxAttentionScore `n_reps` is already in uniforms so do not use it from hardcoded.
### Description <!-- Describe your changes. --> Essentially, the vision model is traced differently (this time it's without mask.), and the input indices of op.Add and op.MatMul can be different. Also, fp16 and fp32 need different tracing patterns (op.Cast). 1. Add another traced pattern to CLIP attention to cover no attention_mask case 2. Accept different index of input on op.Add and op.MatMul (be more general) 3. fp16 and fp32 shows different pattern (op.Cast after op.Softmax) 4. Refactor test_fastgelu.py to cover torch.onnx.export(..., dynamo=True) 5. Add gemma3 vision attention (SigLip) test to cover both fp16 and fp32 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To optimize Gemma3 multi-modal model, the changes are needed. https://huggingface.co/google/gemma-3-4b-it NOTE: some related follow-ups (upstream optimizations to onnxscript-optimizer): microsoft/onnxscript#2158 microsoft/onnxscript#2156
### Description Update packaging pipeline for Nodejs binding. This change updates the pipeline to perform all Node.js binding builds, including: - Windows x64 ( CPU, DML, WebGPU ) - Windows arm64 ( CPU, DML, WebGPU ) - Linux x64 ( CPU, CUDA, TensorRT, WebGPU ) - Linux arm64 ( CPU ) - MacOS x64 ( CPU, CoreML, WebGPU ) - MacOS arm64 ( CPU, CoreML, WebGPU ) #### Dependencies The Node.js binding depends on the Nuget package from the same build. Because NPM has a size limit so we cannot fit libonnxruntime_provider_cuda.so into it. The Node.js binding works in a way that an installation script will try to download the Nuget package of the corresponding version.
…oft#24239) ### Description This change adds support for GatherBlockQuantized to use uin8_t as data's type with the same semantics as MatMulNBits. Zero_Points and Gather Axis other than 0 are not yet supported, in order to keep the change scoped. ### Motivation and Context With the newer llama models like Phi4 trained with shared embeddings, the weights of the lm_head matrix and the embeddings table are exactly the same. These embeddings are huge, unquantized embeddings are 1.2GB in Phi4 mini instruct, at int4 quantization the weights are still 300MB. We can go a step further and have these two ops the lm_head matmulnbits and GatherBlockQuantized share the same weights, that would save 300MB on the model size. The two things that hinder that are the shape expectations for GatherBlockQuantized and the data type supported for data in GatherBlockQuantized. The shape can be solved via a simple reshape op, but the data type needs code changes and that is what this change does. Here is Phi4 modified with shared weights between lm_head and matmulnbits, this model is just 2.1GB on disk. <img width="164" alt="image" src="https://github.com/user-attachments/assets/8bdddbb9-5b44-4839-ab48-605bee53d66b" /> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description Add Conv, ConvTranspose, and FusedConv to the WebGPU execution provider. ### Motivation and Context Required for operator coverage.
…etQueue (microsoft#24313) ### Description <!-- Describe your changes. --> This PR is one of a series of changes for optimization of Dawn API usage. See microsoft#24281 Optimizes the usage of wgpuDeviceGetQueue.
…tion (perf test) (microsoft#24303) In Perf Test for DirectML EP, for the "performance_preference" runtime key we could not select the "minimum_power" value option due to a small bug. This PR fixes it so that "minimum_power" can be used and ran. I will also link the respective issue to this PR I made the change, built onnxruntime, and tested the perf_test.exe + DLLs on a system with Intel Integrated Graphics + Nvidia dGPU. Switching between 'minimum_power' and 'high_performance', I can see the options respectively choose Intel Integrated and Nvidia dGPU as device runtimes respectively (I checked task manager utilization for both devices). Both inferences complete with no problems. I am attaching a reproducer here with the built perf_test and the commands I tried to test it: [DLL_Build_DML_Reproducer.zip](https://github.com/user-attachments/files/19596463/DLL_Build_DML_Reproducer.zip) Issue microsoft#24182 @fdwr Hi, I fixed the issue, if you could please review, thank you
### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
This PR uses 1d disptach group size and uses workgroup_idx instead of workgroup.x|workgroup.y in case they are normalized.
…microsoft#24315) ### Description This PR is one of a series of changes for optimization of Dawn API usage. See microsoft#24281 Reduce the calls to wgpuBufferAddRef and wgpuBufferRelease (part 1).
### Description SessionOptions now have a new property - load_cancelation_flag. This flag if set to true causes the model to abort load and initialization for huge models. ### Motivation and Context Some users request an ability to abandon model loading and initialization if that exceeds certain time limits.
) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Yulong Wang <[email protected]>
…24327) ### Description Exclude WebGPU from Conv3D tests ### Motivation and Context Fix failing tests in packaging pipelines.
### Description [VitisAI EP] export InferShapes to VitisAIEP --------- Co-authored-by: Wang Chunye <[email protected]> Co-authored-by: Zhenze <[email protected]>
ankitm3k
approved these changes
Apr 8, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backmerging with Msft commits