Skip to content

ONNX Runtime v1.19.0

Compare
Choose a tag to compare
@MaanavD MaanavD released this 19 Aug 18:44
26250ae

Announcements

  • Note that the wrong commit was initially tagged with v1.19.0. The final commit has since been correctly tagged: 26250ae. This shouldn't effect much, but sorry for the inconvenience!

Build System & Packages

  • Numpy support for 2.x has been added
  • Qualcomm SDK has been upgraded to 2.25
  • ONNX has been upgraded from 1.16 → 1.16.1
  • Default GPU packages use CUDA 12.x and Cudnn 9.x (previously CUDA 11.x/CuDNN 8.x) CUDA 11.x/CuDNN 8.x packages are moved to the aiinfra VS feed.
  • TensorRT 10.2 support added
  • Introduced Java CUDA 12 packages on Maven.
  • Discontinued support for Xamarin. (Xamarin reached EOL on May 1, 2024)
  • Discontinued support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
  • Discontinued support for iOS 12 and increasing the minimum supported iOS version to 13.

Core

Performance

  • Added QDQ support for INT4 quantization in CPU and CUDA Execution Providers
  • Implemented FlashAttention on CPU to improve performance for GenAI prompt cases
  • Improved INT4 performance on CPU (X64, ARM64) and NVIDIA GPUs

Execution Providers

  • TensorRT

    • Updated to support TensorRT 10.2
    • Remove calls to deprecated api’s
    • Enable refittable embedded engine when ONNX model provided as byte stream
  • CUDA

    • Upgraded cutlass to 3.5.0 for performance improvement of memory efficient attention.
    • Updated MultiHeadAttention and Attention operators to be thread-safe.
    • Added sdpa_kernel provider option to choose kernel for Scaled Dot-Product Attention.
    • Expanded op support - Tile (bf16)
  • CPU

    • Expanded op support - GroupQueryAttention, SparseAttention (for Phi-3 small)
  • QNN

    • Updated to support QNN SDK 2.25
    • Expanded op support - HardSigmoid, ConvTranspose 3d, Clip (int32 data), Matmul (int4 weights), Conv (int4 weights), prelu (fp16)
    • Expanded fusion support – Conv + Clip/Relu fusion
  • OpenVINO

    • Added support for OpenVINO 2024.3
    • Support for enabling EpContext using session options
  • DirectML

    • Updated DirectML from 1.14.1 → 1.15.1
    • Updated ONNX opset from 17 → 20
    • Opset 19 and Opset 20 are supported with known caveats:
      • Gridsample 20: 5d not supported
      • DeformConv not supported

Mobile

Web

  • Updated JavaScript packaging to align with best practices, including slight incompatibilities when apps bundle onnxruntime-web
  • Improved CPU operators coverage for WebNN (now supported by Chrome)

Training

  • No specific updates

GenAI

  • Support for new models Qwen, Llama 3.1, Gemma 2, phi3 small
  • Support to build quantized models with method AWQ and GPTQ
  • Performance improvements for Intel and Arm CPU
  • Packing and language binding
    • Added Java bindings (build from source)
    • Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
    • Publish packages for Win Arm
    • Support for Android (build from source)
  • Bug fixes, like the long prompt correctness issue for phi3.

Extensions

  • Added C APIs for language, vision and audio processors including new FeatureExtractor for Whisper
  • Support for Phi-3 Small Tokenizer and new OpenAI tiktoken format for fast loading of BPE tokenizers
  • Added new CUDA custom operators such as MulSigmoid, Transpose2DCast, ReplaceZero, AddSharedInput and MulSharedInput
  • Enhanced Custom Op Lite API on GPU and fused kernels for DORT
  • Bug fixes, including null bos_token for Qwen2 tokenizer and SentencePiece converted FastTokenizer issue on non-ASCII characters, as well as necessary updates for MSVC 19.40 and numpy 2.0 release

Contributors

Changming Sun, Baiju Meswani, Scott McKay, Edward Chen, Jian Chen, Wanming Lin, Tianlei Wu, Adrian Lizarraga, Chester Liu, Yi Zhang, Yulong Wang, Hector Li, kunal-vaishnavi, pengwa, aciddelgado, Yifan Li, Xu Xing, Yufeng Li, Patrice Vignola, Yueqing Zhang, Jing Fang, Chi Lo, Dmitri Smirnov, mingyueliuh, cloudhan, Yi-Hong Lyu, Ye Wang, Ted Themistokleous, Guenther Schmuelling, George Wu, mindest, liqun Fu, Preetha Veeramalai, Justin Chu, Xiang Zhang, zz002, vraspar, kailums, guyang3532, Satya Kumar Jandhyala, Rachel Guo, Prathik Rao, Maximilian Müller, Sophie Schoenmeyer, zhijiang, maggie1059, ivberg, glen-amd, aamajumder, Xavier Dupré, Vincent Wang, Suryaprakash Shanmugam, Sheil Kumar, Ranjit Ranjan, Peishen Yan, Frank Dong, Chen Feiyue, Caroline Zhu, Adam Louly, Ștefan Talpalaru, zkep, winskuo-quic, wejoncy, vividsnow, vivianw-amd, moyo1997, mcollinswisc, jingyanwangms, Yang Gu, Tom McDonald, Sunghoon, Shubham Bhokare, RuomeiMS, Qingnan Duan, PeixuanZuo, Pavan Goyal, Nikolai Svakhin, KnightYao, Jon Campbell, Johan MEJIA, Jake Mathern, Hans, Hann Wang, Enrico Galli, Dwayne Robinson, Clément Péron, Chip Kerchner, Chen Fu, Carson M, Adam Reeve, Adam Pocock.

Big thank you to everyone who contributed to this release!

Full Changelog: v1.18.1...v1.19.0