ONNX Runtime v1.12.0
Announcements
- For Execution Provider maintainers/owners: the lightweight compile API is now the default compiler API for all Execution Providers (this was previously only available for the mobile build). If you have an EP using the legacy compiler API, please migrate to the lightweight compile API as soon as possible. The legacy API will be deprecated in next release (ORT 1.13).
- netstandard1.1 support is being deprecated in this release and will be removed in the next ORT 1.13 release
Key Updates
General
- ONNX spec support
- onnx opset 17
- onnx-ml opset 3 (TreeEnsemble update)
- BeamSearch operator for encoder-decoder transformers models
- Support for invoking individual ops without the need to create a separate graph
- For use with custom op development to reuse ORT code
- Support for feeding external initializers (for large models) as byte arrays for model inferencing
- Build switch to disable usage of abseil library to remove dependency
Packages
- Python 3.10 support
- Mac M1 support in Python and Java packages
- .NET 6/MAUI support in Nuget C# package
- Additional target frameworks: net6.0, net6.0-android, net6.0-ios, net6.0-macos
- NOTE: netstandard1.1 support is being deprecated in this release and will be removed in the 1.13 release
- onnxruntime-openvino package available on Pypi (from Intel)
Performance and Quantization
- Improved C++ APIs that now utilize RAII for better memory management
- Operator performance optimizations, including GatherElements
- Memory optimizations to support compute-intensive real-time inferencing scenarios (e.g. audio inferencing scenarios)
- CPU usage savings for infrequent inference requests by reducing thread spinning
- Memory usage reduction through use of containers from the abseil library, especially inlined vectors used to store tensor shapes and inlined hash maps
- New quantized kernels for weight symmetry to improve performance on ARM64 little core (GEMM and Conv)
- Specialized kernel to improve performance of quantized Resize by up to 2x speedup
- Improved the thread job partition for QLinearConv, demonstrating up to ~20% perf gain for certain models
- Quantization tool: improved ONNX shape inference for large models
Execution Providers
- TensorRT EP
- TensorRT 8.4 support
- Provide option to share execution context memory between TensorRT subgraphs
- Workaround long CI test time caused by frequent initialization/de-initialization of TensorRT builder
- Improve subgraph partitioning and consolidate TensorRT subgraphs when possible
- Refactor engine cache serialization/deserialization logic
- Miscellaneous bug fixes and performance improvements
- OpenVINO EP
- Pre-Built ONNXRuntime binaries with OpenVINO now available on pypi: onnxruntime-openvino
- Performance optimizations of existing supported models
- New runtime configuration option ‘enable_dynamic_shapes’ added to enable dynamic shapes for each iteration
- ORTModule included as part of OVEP Python Package to enable Torch ORT Inference
- DirectML EP
- Updated to DirectML 1.9
- Opset 13-15 support: #11827, #11814, #11782, #11772
- Bug fixes: Xbox command list reuse, descriptor heap reset, command allocator memory growth, negative pad counts, node suffix removal
- TVM EP - details
- Updated to add model .dll ingestion and execution on Windows
- Updated documentation and CI tests
- [New] SNPE EP - details
- [Preview] XNNPACK EP - initial infrastructure with limited operator support, for use with ORT Mobile and ORT Web
- Currently supports Conv and MaxPool, with work in progress to add more kernels
Mobile
- Binary size reductions in Android minimal build - 12% reduction in size of base build with no operator kernels
- Added new operator support to NNAPI and CoreML EPs to improve ability to run super resolution and BERT models using NPU
- NNAPI: DepthToSpace, PRelu, Gather, Unsqueeze, Pad
- CoreML: DepthToSpace, PRelu
- Added Docker file to simplify running a custom minimal build to create an ORT Android package
- Initial XNNPACK EP compatibility
Web
- Memory usage optimizations
- Initial XNNPACK EP compatibility
ORT Training
- [New] ORT Training acceleration is also natively available through HuggingFace Optimum
- [New] FusedAdam Optimizer now available through the torch-ort package for easier training integration
- FP16_Optimizer Support for more DeepSpeed Versions
- Bfloat16 support for AtenOp
- Added gradient ops for ReduceMax and ReduceMin
- Updates to Min and Max grad ops to use distributed logic
- Optimizations
- Optimized perf for Gelu and GeluGrad kernels for mixed precision models
- Enabled fusions for SimplifiedLayerNorm
- Added bitmask versions of Dropout, BiasDropout and DropoutGrad which brings ~8x space savings for the mast output.
Known issues
- The Microsoft.ML.OnnxRuntime.DirectML package on Nuget has an issue and will be fixed in a patch. Fix: #12368
- The Maven package has a packaging issue for Mac M1 builds and will be fixed in a patch. Fix: #12335 / Workaround discussion
- Windows builds are not compatible with Windows 8.x in this release. Please use v1.11 for now.
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
snnn, edgchen1, fdwr, skottmckay, iK1D, fs-eire, mszhanyi, WilBrady, justinchuby, tianleiwu, PeixuanZuo, garymm, yufenglee, adrianlizarraga, yuslepukhin, dependabot[bot], chilo-ms, vvchernov, oliviajain, ytaous, hariharans29, sumitsays, wangyems, pengwa, baijumeswani, smk2007, RandySheriffH, gramalingam, xadupre, yihonglyu, zhangyaobit, YUNQIUGUO, jcwchen, chenfucn, souptc, chandru-r, jstoecker, hanbitmyths, RyanUnderhill, georgen117, jywu-msft, mindest, sfatimar, HectorSVC, Craigacp, jeffdaily, zhijxu-MS, natke, stevenlix, jeffbloo, guoyu-wang, daquexian, faxu, jingyanwangms, adtsai, wschin, weixingzhang, wenbingl, MaajidKhan, ashbhandare, ajindal1, zhanghuanrong, tiagoshibata, askhade, liqunfu