ONNX Runtime v1.15.0
Announcements
Starting from the next release(ONNX Runtime 1.16.0), at operating system level we will drop the support for
- iOS 11 and below. iOS 12 will be the minimum supported version.
- CentOS 7, Ubuntu 18.04, and any Linux distro without glibc version >=2.28.
At compiler level we will drop the support for
- GCC version <= 9
- Visual Studio 2019
Also, we will remove the onnxruntime_DISABLE_ABSEIL build option since we will upgrade protobuf and the new protobuf version will need abseil.
General
- Added support for ONNX Optional type in C# API
- Added collectives to support multi-GPU inferencing
- Updated macOS build machines to macOS-12, which comes with Xcode 14.2 and we should stop using Xcode 12.4
- Added Python 3.11 support (deprecate 3.7, support 3.8-3.11) in packages for Onnxruntime CPU, Onnxruntime-GPU, Onnxruntime-directml, and onnxruntime-training.
- Updated to CUDA 11.8. ONNX Runtime source code is still compatible with CUDA 11.4 and 12.x.
- Dropped the support for Windows 8.1 and below
- Eager mode code and onnxruntime_ENABLE_EAGER_MODE cmake option are deleted.
- Upgraded Mimalloc version from 2.0.3 to 2.1.1
- Upgraded protobuf version from 3.18.3 to 21.12
- New dependency: cutlass, which is only used in CUDA/TensorRT packages.
- Upgraded DNNL from 2.7.1 to 3.0
Build System
- On POSIX systems by default we disallow using "root" user to build the code. If needed, you can append "--allow_running_as_root" to your build command to bypass the check.
- Add the support for building the source natively on Windows ARM64 with Visual Studio 2022.
- Added a Gradle wrapper and updated Gradle version from 6.8.3 to 8.0.1. (Gradle is the tool for building ORT Java package)
- When doing cross-compiling, the build scripts will try to download a prebuit protoc from Github instead of building the binary from source. Because now protobuf has many dependencies. It is not easy to setup a build environment for protobuf.
Performance
- Improved string marshalling and reduce GC pressure
- Added a build option to allow using a lock-free queue in threadpool for improved CPU utilization
- Fix CPU memory leak due to external weights
- Added fused decoder multi-head attention kernel to improve GPT and decoder models(like T5, Whisper)
- Added packing mode to improve encoder models with inputs of large padding ratio
- Improved generation algorithm (BeamSearch, TopSampling, GreedySearch)
- Improved performance for StableDiffusion, ViT, GPT, whisper models
Execution Providers
Two new execution providers: JS EP and QNN EP.
TensorRT EP
- Official support for TensorRT 8.6
- Explicit shape profile overrides
- Support for TensorRT plugins via ORT custom op
- Improve support for TensorRT options (heuristics, sparsity, optimization level, auxiliary stream, tactic source selection etc.)
- Support for TensorRT timing cache
- Improvements to our test coverage, specifically for opset16-17 models and package pipeline unit test coverage.
- Other misc bugfixes and improvements.
OpenVINO EP
- Support for OpenVINO 2023.0
- Dynamic shapes support for iGPU
- Changes to OpenVINO backend to improve first inference latency
- Deprecation of HDDL-VADM and Myriad VPU support
- Misc bug fixes.
QNN EP
DirectML EP:
- Updated to DirectML 1.12
- Opset 16-17 support
AzureEP
- Added support for OpenAI whisper model
- Available in a Nuget pkg in addition to Python
Mobile
New packages
- Swift Package Manager for onnxruntime
- Nuget package for onnxruntime-extensions (supports Android/iOS for MAUI/Xamarin)
- React Native package for onnxruntime can optionally include onnxruntime-extensions
Pre/Post processing
-
Added support for built-in pre and post processing for NLP scenarios: classification, question-answering, text-prediction
-
Added support for built-in pre and post processing for Speech Recognition (Whisper)
-
Added support for built-in post processing for Object Detection (YOLO). Non-max suppression, draw bounding boxes
-
Additional CoreML and NNAPI kernels to support customer scenarios
- NNAPI: BatchNormalization, LRN
- CoreML: Div, Flatten, LeakyRelu, LRN, Mul, Pad, Pow, Sub
Web
- [preview] WebGPU support
- Support building the source code with "MinGW make" on Windows.
ORT Training
On-device training:
- Official package for On-Device Training now available. On-device training extends ORT Inference solutions to enable training on edge devices.
- APIs and Language bindings supported for C, C++, Python, C#, Java.
- Packages available for Desktop and Android.
- For custom builds refer build instructions.
Others
- Added graph optimizations which leverage the sparsity in the label data to improve performance. With these optimizations we see performance gains ranging from 4% to 15% for popular HF models over baseline ORT.
- Vision transformer models like ViT, BEIT and SwinV2 see upto 44% speedup with ORT Training+ DeepSpeed over PyTorch eager mode on AzureML.
- Added optimizations for SOTA models like Dolly and Whisper. ORT Training + DS now gives ~17% speedup for Whisper and ~4% speedup for Dolly over PyTorch eager mode. Dolly optimizations on main branch show a ~40% over eager mode.
Known Issues
- The onnxruntime-training 1.15.0 packages published to pypi.org were actually built in Debug mode instead of Release mode. You can get the right one from https://download.onnxruntime.ai/ . We will fix the issue in the next patch release.
- XNNPack EP does not work on x86 CPUs without AVX-512 instructions, because we used wrong alignment when allocating buffers for XNNPack to use.
- The CUDA EP source code has a build error when CUDA version <11.6. See #16000.
- The onnxruntime-training builds are missing the training header files.
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
snnn, fs-eire, edgchen1, wejoncy, mszhanyi, PeixuanZuo, pengwa, jchen351, cloudhan, tianleiwu, PatriceVignola, wangyems, adrianlizarraga, chenfucn, HectorSVC, baijumeswani, justinchuby, skottmckay, yuslepukhin, RandyShuai, RandySheriffH, natke, YUNQIUGUO, smk2007, jslhcl, chilo-ms, yufenglee, RyanUnderhill, hariharans29, zhanghuanrong, askhade, wschin, jywu-msft, mindest, zhijxu-MS, dependabot[bot], xadupre, liqunfu, nums11, gramalingam, Craigacp, fdwr, shalvamist, jstoecker, yihonglyu, sumitsays, stevenlix, iK1D, pranavsharma, georgen117, sfatimar, MaajidKhan, satyajandhyala, faxu, jcwchen, hanbitmyths, jeffbloo, souptc, ytaous kunal-vaishnavi