ONNX Runtime v1.14.0
Announcements
- Building ORT from source will require cmake version >=3.24 instead of >=3.18.
General
- ONNX 1.13 support (opset 18)
- Threading
- New custom operator APIs
- Multi-stream Execution Provider refactoring
- Improves GPU utilization by putting parallel inference requests on different GPU streams. Updated for CUDA, TensorRT, and ROCM execution providers
- Improves memory efficiency by enabling GPU memory reuse across different streams
- Enables Execution Provider developer to customize its stream implementation by providing "Stream" interface in ExecutionProvider API
- [Preview] Rust API for ORT - not part of release branch but available to build in main.
Performance
- Support of quantization with AMX on Sapphire Rapids processors
- CUDA EP performance improvements:
- Improve performance of transformer models and decoding methods: beam search, greedy search, and topp sampling.
- Stable Diffusion model optimizations
- Change cudnn_conv_use_max_workspace default value to be 1
- Performance improvements to GRU and Slice operators
Execution Providers
- TensorRT EP
- Adds support for TensorRT 8.5 GA versions
- Bug fixes
- OpenVINO EP
- Adds support for OpenVINO 2022.3
- DirectML EP:
- Updated to DML 1.10.1
- Additional operators: NonZero, Shape, Size, Attention, EmbedLayerNorm, SkipLayerNorm, BiasGelu
- Additional data types: Abs, Sign, Where
- Enable SetOptimizedFilePath export/reload
- Bug fixes/extensions: allow squeeze-13 axes, EinSum with MatMul NHCW
- ROCm EP: 5.4 support and GA ready
- [Preview] Azure EP - supports AzureML hosted models using Triton for hybrid inferencing on-device and on-cloud
Mobile
- Pre/Post processing
- Support updating mobilenet and super resolution models to move the pre and post processing into the model, including usage of custom ops for conversion to/from jpg/png
- onnxruntime-extensions python package includes the model update script to add pre/post processing to the model
- See example model update usage
- [Coming soon] onnxruntime-extensions packages for Android and iOS with DecodeImage and EncodeImage custom ops
- Updated the onnxruntime inference examples to demonstrate end-to-end usage with onnxruntime-extensions package
- Support updating mobilenet and super resolution models to move the pre and post processing into the model, including usage of custom ops for conversion to/from jpg/png
- XNNPACK
- Added support for additional commonly used operators
- Add iOS build support
- XNNPACK EP is now included in the onnxruntime-c iOS package
- Added support for using the ORT allocator in XNNPACK kernels to minimize memory usage
Web
- onnxruntime-extensions included in default ort-web build (NLP centric)
- XNNPACK Gemm
- Improved exception handling
- New utility functions (experimental) to help with exchanging data between images and tensors.
Training
- Performance optimizations and bug fixes for Hugging Face models (i.e. Xlnet and Bloom)
- Stable diffusion optimizations for training, including support for Resize and InstanceNorm gradients and addition of ORT-enabled examples to the diffusers library
- FP16 optimizer exposed in torch-ort (details)
- Bug fixes for Hugging Face models
Known Issues
- The Microsoft.ML.OnnxRuntime.DirectML package name includes -dev-* suffix. This is functionally equivalent to the release branch build, and a patch is in progress.
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
snnn, skottmckay, edgchen1, hariharans29, tianleiwu, yufenglee, guoyu-wang, yuslepukhin, fs-eire, pranavsharma, iK1D, baijumeswani, tracysh, thiagocrepaldi, askhade, RyanUnderhill, wangyems, fdwr, RandySheriffH, jywu-msft, zhanghuanrong, smk2007, pengwa, liqunfu, shahasad, mszhanyi, SherlockNoMad, xadupre, jignparm, HectorSVC, ytaous, weixingzhang, stevenlix, tiagoshibata, faxu, wschin, souptc, ashbhandare, RandyShuai, chilo-ms, PeixuanZuo, cloudhan, dependabot[bot], jeffbloo, chenfucn, linkerzhang, duli2012, codemzs, oliviajain, natke, YUNQIUGUO, Craigacp, sumitsays, orilevari, BowenBao, yangchen-MS, hanbitmyths, satyajandhyala, MaajidKhan, smkarlap, sfatimar, jchen351, georgen117, wejoncy, PatriceVignola, adrianlizarraga, justinchuby, zhangxiang1993, gineshidalgo99, tlh20, xzhu1900, jeffdaily, suryasidd, yihonglyu, liuziyue, chentaMS, jcwchen, ybrnathan, ajindal1, zhijxu-MS, gramalingam, WilBrady, garymm, kkaranasos, ashari4, martinb35, AdamLouly, zhangyaobit, vvchernov, jingyanwangms, wenbingl, daquexian, sreekanth-yalachigere, NonStatic2014, mayavijx, mindest, jstoecker, manashgoswami, Andrews548, baowenlei, kunal-vaishnavi