Release ONNX Runtime v1.19.0 · microsoft/onnxruntime

Announcements

Note that the wrong commit was initially tagged with v1.19.0. The final commit has since been correctly tagged: 26250ae. This shouldn't effect much, but sorry for the inconvenience!

Build System & Packages

Numpy support for 2.x has been added
Qualcomm SDK has been upgraded to 2.25
ONNX has been upgraded from 1.16 → 1.16.1
Default GPU packages use CUDA 12.x and Cudnn 9.x (previously CUDA 11.x/CuDNN 8.x) CUDA 11.x/CuDNN 8.x packages are moved to the aiinfra VS feed.
TensorRT 10.2 support added
Introduced Java CUDA 12 packages on Maven.
Discontinued support for Xamarin. (Xamarin reached EOL on May 1, 2024)
Discontinued support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
Discontinued support for iOS 12 and increasing the minimum supported iOS version to 13.

Core

Implemented DeformConv
Fixed big-endian and support build on AIX

Performance

Added QDQ support for INT4 quantization in CPU and CUDA Execution Providers
Implemented FlashAttention on CPU to improve performance for GenAI prompt cases
Improved INT4 performance on CPU (X64, ARM64) and NVIDIA GPUs

Execution Providers

TensorRT
- Updated to support TensorRT 10.2
- Remove calls to deprecated api’s
- Enable refittable embedded engine when ONNX model provided as byte stream
CUDA
- Upgraded cutlass to 3.5.0 for performance improvement of memory efficient attention.
- Updated MultiHeadAttention and Attention operators to be thread-safe.
- Added sdpa_kernel provider option to choose kernel for Scaled Dot-Product Attention.
- Expanded op support - Tile (bf16)
CPU
- Expanded op support - GroupQueryAttention, SparseAttention (for Phi-3 small)
QNN
- Updated to support QNN SDK 2.25
- Expanded op support - HardSigmoid, ConvTranspose 3d, Clip (int32 data), Matmul (int4 weights), Conv (int4 weights), prelu (fp16)
- Expanded fusion support – Conv + Clip/Relu fusion
OpenVINO
- Added support for OpenVINO 2024.3
- Support for enabling EpContext using session options
DirectML
- Updated DirectML from 1.14.1 → 1.15.1
- Updated ONNX opset from 17 → 20
- Opset 19 and Opset 20 are supported with known caveats:
  - Gridsample 20: 5d not supported
  - DeformConv not supported

Mobile

Additional CoreML ML Program operators were added
- See supported operators list here
Fixed packaging issue with macOS framework in onnxruntime-c cocoapod
Removed Xamarin support
- Xamarin EOL was May 1, 2024
- Xamarin official support policy | .NET (microsoft.com)

Web

Updated JavaScript packaging to align with best practices, including slight incompatibilities when apps bundle onnxruntime-web
Improved CPU operators coverage for WebNN (now supported by Chrome)

Training

No specific updates

GenAI

Support for new models Qwen, Llama 3.1, Gemma 2, phi3 small
Support to build quantized models with method AWQ and GPTQ
Performance improvements for Intel and Arm CPU
Packing and language binding
- Added Java bindings (build from source)
- Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
- Publish packages for Win Arm
- Support for Android (build from source)
Bug fixes, like the long prompt correctness issue for phi3.

Extensions

Added C APIs for language, vision and audio processors including new FeatureExtractor for Whisper
Support for Phi-3 Small Tokenizer and new OpenAI tiktoken format for fast loading of BPE tokenizers
Added new CUDA custom operators such as MulSigmoid, Transpose2DCast, ReplaceZero, AddSharedInput and MulSharedInput
Enhanced Custom Op Lite API on GPU and fused kernels for DORT
Bug fixes, including null bos_token for Qwen2 tokenizer and SentencePiece converted FastTokenizer issue on non-ASCII characters, as well as necessary updates for MSVC 19.40 and numpy 2.0 release

Contributors

Changming Sun, Baiju Meswani, Scott McKay, Edward Chen, Jian Chen, Wanming Lin, Tianlei Wu, Adrian Lizarraga, Chester Liu, Yi Zhang, Yulong Wang, Hector Li, kunal-vaishnavi, pengwa, aciddelgado, Yifan Li, Xu Xing, Yufeng Li, Patrice Vignola, Yueqing Zhang, Jing Fang, Chi Lo, Dmitri Smirnov, mingyueliuh, cloudhan, Yi-Hong Lyu, Ye Wang, Ted Themistokleous, Guenther Schmuelling, George Wu, mindest, liqun Fu, Preetha Veeramalai, Justin Chu, Xiang Zhang, zz002, vraspar, kailums, guyang3532, Satya Kumar Jandhyala, Rachel Guo, Prathik Rao, Maximilian Müller, Sophie Schoenmeyer, zhijiang, maggie1059, ivberg, glen-amd, aamajumder, Xavier Dupré, Vincent Wang, Suryaprakash Shanmugam, Sheil Kumar, Ranjit Ranjan, Peishen Yan, Frank Dong, Chen Feiyue, Caroline Zhu, Adam Louly, Ștefan Talpalaru, zkep, winskuo-quic, wejoncy, vividsnow, vivianw-amd, moyo1997, mcollinswisc, jingyanwangms, Yang Gu, Tom McDonald, Sunghoon, Shubham Bhokare, RuomeiMS, Qingnan Duan, PeixuanZuo, Pavan Goyal, Nikolai Svakhin, KnightYao, Jon Campbell, Johan MEJIA, Jake Mathern, Hans, Hann Wang, Enrico Galli, Dwayne Robinson, Clément Péron, Chip Kerchner, Chen Fu, Carson M, Adam Reeve, Adam Pocock.

Big thank you to everyone who contributed to this release!

Full Changelog: v1.18.1...v1.19.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX Runtime v1.19.0