Sync with Microsoft ONNX Runtime - 25/08/2025 #789

Jaswanth51 · 2025-08-25T04:48:09Z

Description

Synchronizing intel/onnxruntime ovep-develop branch with latest changes from microsoft/onnxruntime master branch.

### Description Add qnn support for mod op when fmod = 0. ### Motivation and Context QNN doesn't support mod op. This PR will allow QNN process mod op for fmod=0 case. --------- Signed-off-by: Mu-Chein Hsu <[email protected]>

### Description In PoolOpBuilder, - Revise the check to exploit ORT macros. - Fix invoking the function for 5D cases. ### Motivation and Context Refer to microsoft#25778. Pool builder incorrectly invokes a function calculating 4D shape in 5D input, which originally expects 3D cases only. However, the check used assert to validate the shape, which did not work in Release nor RelWithDebInfo builds.

### Description Add QNN EP support for thresholdedrelu op. ### Motivation and Context thresholdedrelu wasn't previously supported. Signed-off-by: Mu-Chein Hsu <[email protected]>

…ob (microsoft#25794) ### Description  Set iOS simulator runtime version to 18.5 in mac.yml iphone_simulator job. This job uses Xcode 16.4. According to this table, the corresponding simulator SDK version is 18.5. https://github.com/actions/runner-images/blob/da7977bf2699f44e70b7d3c3352dedb0da38db9c/images/macos/macos-15-arm64-Readme.md?plain=1#L181 ### Motivation and Context  Address intermittent CI build timeouts.

### Description Add a new API `Graph_GetModelMetadata` ### Motivation and Context VitisAI EP would convert ONNX IR to another IR which is suitable for AMD AI compilers. The metadata in a OrtModel contains many important infomation produced by other tools, e.g. Olive. This API potentially used by many other execution providers which need to access the same information.

…osoft#25562) ### Description  Add HardSwish operator which is x*HardSigmoid(x) Add bf16 support for HardSigmoid ### Motivation and Context  HardSwish is implemented as HardSidmoid + Add in CUDA EP currently. A fused HardSwish should take half the time of HardSigmoid + Add. --------- Co-authored-by: kaiyu <[email protected]> Co-authored-by: Copilot <[email protected]>

### Description Fix build break caused by warning C4702: unreachable code. ``` onnxruntime\contrib_ops\webgpu\quantization\matmul_nbits.cc(95,1): error C2220: the following warning is treated as an error [C:\code\o3\build_main\Debug\onnxruntime_providers_webgpu.vcxproj] onnxruntime\contrib_ops\webgpu\quantization\matmul_nbits.cc(95,1): warning C4702: unreachable code [C:\code\o3\b uild_main\Debug\onnxruntime_providers_webgpu.vcxproj] ``` Seems the CI pipeline does not catch this.

### Description Add a build flag to enable/disable mixed gemm cutlass kernel. To disable the kernel, you can append the following at the end of build command line: `--cmake_extra_defines onnxruntime_USE_FPA_INTB_GEMM=OFF` ### Motivation and Context FpA IntB Gemm need a lot of time to compile. With such option, developer can speed up the build especially on build machine with limited memory.

* Implements `GetEPContextNodes()` * Enables usage of `AddExternalInitializersFromFilesInMemory` for models that have to be communicated as byte stream but are larger than 2GB * Add EP context unit tests for file, bytestreams and both embed modes NOTE: For large models > 2GB, `embed_mode=0` must be used. `embed_mode=1` fails due to protobuf limitations --------- Co-authored-by: Maximilian Müller <[email protected]>

### Description upgrade WGSL Template to v0.1.15 Changes: - fs-eire/wgsl-template#21

@skottmckay

…rosoft#25800) This reconfiguration is done to NOT allocate tensors with an exact matching size. If that strategy is used a tensor will always trigger an allocation in the arena and not reuse memory since the memory size has to exactly match. This became a big problem with ORT GenAI since the arena grew constantly when prompting with different prompt lengths. No arena shrinkage was triggered to return older tensors. @skottmckay I am happy to be educated of a better usage of the allocators. Issues with this: Since the arena is not used for workspace allocations anymore (using reserve) it will likely not be possible in the future to allocate on a stream and immediately free memory after an enqueue call. That could have enabled workspace sharing in a multi model pipeline very nicely. @chilo-ms can you help merge this.

### Description  This PR provides C++ interfaces for the following: Env ==== CopyTensors() CreateSharedAllocator GetSharedAllocator ReleaseSharedAllocator CreateAndRegisterAllocatorV2 RegisterAllocator UnregisterAllocator EpDevice ====== EpDevice_MemoryInfo CreateSyncStreamForEpDevice MemoryInfo ======== CreateMemoryInfo_V2 MemoryInfoGetName MemoryInfoGetId MemoryInfoGetMemType MemoryInfoGetType MemoryInfoGetDeviceMemType MemoryInfoGetVendorId Session ========== SessionGetInputName SessionGetOutputName SessionGetMemoryInfoForInputs SessionGetMemoryInfoForOutputs SessionGetEpDeviceForInputs SyncStream =========== SyncStream_GetHandle ReleaseSyncStream OrtArenaCfg =========== CreateArenaCfgV2 TRT === CreateTensorRTProviderOptions and V2 UpdateTensorRTProviderOptions SessionOptions ============== OrtSessionOptionsAppendExecutionProvider_CPU Prepacked container ============= CUDA Options V2 =========== OrtCUDAProviderOptionsV2 CreateCUDAProviderOptions GetCUDAProviderOptionsByName UpdateCUDAProviderOptionsWithValue UpdateCUDAProviderOptions GetCUDAProviderOptionsAsString ### Motivation and Context  Provide a way to write exception safe code.

### Description Added the header `<cstdint>` to `semver.h`. ### Motivation and Context Correcting compilation under linux systems, to prevent the error: ``` /xxx/onnxruntime/core/common/semver.h:18:3: error: »uint32_t« does not name a type 18 | uint32_t major{}; 19 | uint32_t minor{}; 20 | uint32_t patch{}; ```

…y info (microsoft#25749) ### Description This pull request introduces a new mechanism for validating compiled model compatibility with execution providers (EPs) in ONNX Runtime. It adds infrastructure for EPs to generate and store compatibility information in model metadata, and for the runtime to enforce compatibility checks during session initialization. ### Motivation and Context  The APIs proposed in this PR address two requirements: 1. Apps that have an already pre-compiled model on device need a way to determine if the pre-compiled app is still valid (given the EPs / drivers / etc. on the system). 2. Apps may have many different pre-compiled versions of a model stored on a remote server, and want to figure out which of those models they should download for the device where they are running. ### Testing Validated that the new suite of tests passes cleanly. Created a private build of this ORT and the AMD Vitis EP. I stepped through the core logic (the EP doesn't have this support wired up as yet so there is no compatibility info written out) and for regression purposes, confirmed I could compile and run inferences through ResNet. --------- Co-authored-by: Aditya Rastogi <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

### Description  Disable cpuinfo for ARM64EC builds. There's an error when linking to cpuinfo built for ARM64EC when using `--use_vckpg`. This issue was exposed by a recent change (microsoft#25228) but cpuinfo was actually not being used before for ARM64EC. The macros here don't properly account for ARM64EC: https://github.com/microsoft/onnxruntime/blob/e6d3e085cb0bb96da7c3458b97316ecca234b37a/onnxruntime/core/common/cpuid_arch_definition.h#L8-L14 ### Motivation and Context  Fix a packaging pipeline failure. Revert to the old behavior of not calling cpuinfo from the CPUIDInfo ctor for ARM64EC. This PR is just a workaround. The cpuinfo link issue needs more investigation.

### Description Put the flash decoding shader into three template files. ### Motivation and Context Moving to templates will improve code readability.

quic-muchhsu and others added 18 commits August 20, 2025 09:36

[QNN EP] Add thresholded relu to QNN EP support (microsoft#25795)

c96503e

### Description Add QNN EP support for thresholdedrelu op. ### Motivation and Context thresholdedrelu wasn't previously supported. Signed-off-by: Mu-Chein Hsu <[email protected]>

[webgpu] upgrade WGSL Template to v0.1.15 (microsoft#25799)

17b2725

### Description upgrade WGSL Template to v0.1.15 Changes: - fs-eire/wgsl-template#21

Add patch file for cpuinfo's vcpkg port (microsoft#25818)

30f865a

Move flash decoding shaders into templates (microsoft#25774)

eb453df

### Description Put the flash decoding shader into three template files. ### Motivation and Context Moving to templates will improve code readability.

Merge branch 'master' into sync_msft_25082025

cb59e2d

Jaswanth51 requested a review from ankitm3k August 25, 2025 04:48

ankitm3k approved these changes Aug 25, 2025

View reviewed changes

ankitm3k merged commit e812aea into ovep-develop Aug 25, 2025
6 of 8 checks passed

ankitm3k deleted the sync_msft_25082025 branch August 25, 2025 05:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync with Microsoft ONNX Runtime - 25/08/2025 #789

Sync with Microsoft ONNX Runtime - 25/08/2025 #789

Uh oh!

Jaswanth51 commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

Sync with Microsoft ONNX Runtime - 25/08/2025 #789

Sync with Microsoft ONNX Runtime - 25/08/2025 #789

Uh oh!

Conversation

Jaswanth51 commented Aug 25, 2025

Description

Uh oh!

Uh oh!

Uh oh!