forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 53
Sync with Microsoft ONNX Runtime - 25/08/2025 #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Description Add qnn support for mod op when fmod = 0. ### Motivation and Context QNN doesn't support mod op. This PR will allow QNN process mod op for fmod=0 case. --------- Signed-off-by: Mu-Chein Hsu <[email protected]>
### Description In PoolOpBuilder, - Revise the check to exploit ORT macros. - Fix invoking the function for 5D cases. ### Motivation and Context Refer to microsoft#25778. Pool builder incorrectly invokes a function calculating 4D shape in 5D input, which originally expects 3D cases only. However, the check used assert to validate the shape, which did not work in Release nor RelWithDebInfo builds.
### Description Add QNN EP support for thresholdedrelu op. ### Motivation and Context thresholdedrelu wasn't previously supported. Signed-off-by: Mu-Chein Hsu <[email protected]>
…ob (microsoft#25794) ### Description <!-- Describe your changes. --> Set iOS simulator runtime version to 18.5 in mac.yml iphone_simulator job. This job uses Xcode 16.4. According to this table, the corresponding simulator SDK version is 18.5. https://github.com/actions/runner-images/blob/da7977bf2699f44e70b7d3c3352dedb0da38db9c/images/macos/macos-15-arm64-Readme.md?plain=1#L181 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Address intermittent CI build timeouts.
### Description Add a new API `Graph_GetModelMetadata` ### Motivation and Context VitisAI EP would convert ONNX IR to another IR which is suitable for AMD AI compilers. The metadata in a OrtModel contains many important infomation produced by other tools, e.g. Olive. This API potentially used by many other execution providers which need to access the same information.
…osoft#25562) ### Description <!-- Describe your changes. --> Add HardSwish operator which is x*HardSigmoid(x) Add bf16 support for HardSigmoid ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> HardSwish is implemented as HardSidmoid + Add in CUDA EP currently. A fused HardSwish should take half the time of HardSigmoid + Add. --------- Co-authored-by: kaiyu <[email protected]> Co-authored-by: Copilot <[email protected]>
### Description Fix build break caused by warning C4702: unreachable code. ``` onnxruntime\contrib_ops\webgpu\quantization\matmul_nbits.cc(95,1): error C2220: the following warning is treated as an error [C:\code\o3\build_main\Debug\onnxruntime_providers_webgpu.vcxproj] onnxruntime\contrib_ops\webgpu\quantization\matmul_nbits.cc(95,1): warning C4702: unreachable code [C:\code\o3\b uild_main\Debug\onnxruntime_providers_webgpu.vcxproj] ``` Seems the CI pipeline does not catch this.
### Description Add a build flag to enable/disable mixed gemm cutlass kernel. To disable the kernel, you can append the following at the end of build command line: `--cmake_extra_defines onnxruntime_USE_FPA_INTB_GEMM=OFF` ### Motivation and Context FpA IntB Gemm need a lot of time to compile. With such option, developer can speed up the build especially on build machine with limited memory.
* Implements `GetEPContextNodes()` * Enables usage of `AddExternalInitializersFromFilesInMemory` for models that have to be communicated as byte stream but are larger than 2GB * Add EP context unit tests for file, bytestreams and both embed modes NOTE: For large models > 2GB, `embed_mode=0` must be used. `embed_mode=1` fails due to protobuf limitations --------- Co-authored-by: Maximilian Müller <[email protected]>
### Description upgrade WGSL Template to v0.1.15 Changes: - fs-eire/wgsl-template#21
…rosoft#25800) This reconfiguration is done to NOT allocate tensors with an exact matching size. If that strategy is used a tensor will always trigger an allocation in the arena and not reuse memory since the memory size has to exactly match. This became a big problem with ORT GenAI since the arena grew constantly when prompting with different prompt lengths. No arena shrinkage was triggered to return older tensors. @skottmckay I am happy to be educated of a better usage of the allocators. Issues with this: Since the arena is not used for workspace allocations anymore (using reserve) it will likely not be possible in the future to allocate on a stream and immediately free memory after an enqueue call. That could have enabled workspace sharing in a multi model pipeline very nicely. @chilo-ms can you help merge this.
### Description <!-- Describe your changes. --> This PR provides C++ interfaces for the following: Env ==== CopyTensors() CreateSharedAllocator GetSharedAllocator ReleaseSharedAllocator CreateAndRegisterAllocatorV2 RegisterAllocator UnregisterAllocator EpDevice ====== EpDevice_MemoryInfo CreateSyncStreamForEpDevice MemoryInfo ======== CreateMemoryInfo_V2 MemoryInfoGetName MemoryInfoGetId MemoryInfoGetMemType MemoryInfoGetType MemoryInfoGetDeviceMemType MemoryInfoGetVendorId Session ========== SessionGetInputName SessionGetOutputName SessionGetMemoryInfoForInputs SessionGetMemoryInfoForOutputs SessionGetEpDeviceForInputs SyncStream =========== SyncStream_GetHandle ReleaseSyncStream OrtArenaCfg =========== CreateArenaCfgV2 TRT === CreateTensorRTProviderOptions and V2 UpdateTensorRTProviderOptions SessionOptions ============== OrtSessionOptionsAppendExecutionProvider_CPU Prepacked container ============= CUDA Options V2 =========== OrtCUDAProviderOptionsV2 CreateCUDAProviderOptions GetCUDAProviderOptionsByName UpdateCUDAProviderOptionsWithValue UpdateCUDAProviderOptions GetCUDAProviderOptionsAsString ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Provide a way to write exception safe code.
### Description Added the header `<cstdint>` to `semver.h`. ### Motivation and Context Correcting compilation under linux systems, to prevent the error: ``` /xxx/onnxruntime/core/common/semver.h:18:3: error: »uint32_t« does not name a type 18 | uint32_t major{}; 19 | uint32_t minor{}; 20 | uint32_t patch{}; ```
…y info (microsoft#25749) ### Description This pull request introduces a new mechanism for validating compiled model compatibility with execution providers (EPs) in ONNX Runtime. It adds infrastructure for EPs to generate and store compatibility information in model metadata, and for the runtime to enforce compatibility checks during session initialization. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The APIs proposed in this PR address two requirements: 1. Apps that have an already pre-compiled model on device need a way to determine if the pre-compiled app is still valid (given the EPs / drivers / etc. on the system). 2. Apps may have many different pre-compiled versions of a model stored on a remote server, and want to figure out which of those models they should download for the device where they are running. ### Testing Validated that the new suite of tests passes cleanly. Created a private build of this ORT and the AMD Vitis EP. I stepped through the core logic (the EP doesn't have this support wired up as yet so there is no compatibility info written out) and for regression purposes, confirmed I could compile and run inferences through ResNet. --------- Co-authored-by: Aditya Rastogi <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description <!-- Describe your changes. --> Disable cpuinfo for ARM64EC builds. There's an error when linking to cpuinfo built for ARM64EC when using `--use_vckpg`. This issue was exposed by a recent change (microsoft#25228) but cpuinfo was actually not being used before for ARM64EC. The macros here don't properly account for ARM64EC: https://github.com/microsoft/onnxruntime/blob/e6d3e085cb0bb96da7c3458b97316ecca234b37a/onnxruntime/core/common/cpuid_arch_definition.h#L8-L14 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix a packaging pipeline failure. Revert to the old behavior of not calling cpuinfo from the CPUIDInfo ctor for ARM64EC. This PR is just a workaround. The cpuinfo link issue needs more investigation.
### Description Put the flash decoding shader into three template files. ### Motivation and Context Moving to templates will improve code readability.
ankitm3k
approved these changes
Aug 25, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.