Skip to content

ONNX Runtime v1.12.0

Compare
Choose a tag to compare
@RandySheriffH RandySheriffH released this 22 Jul 04:43
f466364

Announcements

  • For Execution Provider maintainers/owners: the lightweight compile API is now the default compiler API for all Execution Providers (this was previously only available for the mobile build). If you have an EP using the legacy compiler API, please migrate to the lightweight compile API as soon as possible. The legacy API will be deprecated in next release (ORT 1.13).
  • netstandard1.1 support is being deprecated in this release and will be removed in the next ORT 1.13 release

Key Updates

General

  • ONNX spec support
    • onnx opset 17
    • onnx-ml opset 3 (TreeEnsemble update)
  • BeamSearch operator for encoder-decoder transformers models
  • Support for invoking individual ops without the need to create a separate graph
    • For use with custom op development to reuse ORT code
  • Support for feeding external initializers (for large models) as byte arrays for model inferencing
  • Build switch to disable usage of abseil library to remove dependency

Packages

  • Python 3.10 support
  • Mac M1 support in Python and Java packages
  • .NET 6/MAUI support in Nuget C# package
    • Additional target frameworks: net6.0, net6.0-android, net6.0-ios, net6.0-macos
    • NOTE: netstandard1.1 support is being deprecated in this release and will be removed in the 1.13 release
  • onnxruntime-openvino package available on Pypi (from Intel)

Performance and Quantization

  • Improved C++ APIs that now utilize RAII for better memory management
  • Operator performance optimizations, including GatherElements
  • Memory optimizations to support compute-intensive real-time inferencing scenarios (e.g. audio inferencing scenarios)
    • CPU usage savings for infrequent inference requests by reducing thread spinning
    • Memory usage reduction through use of containers from the abseil library, especially inlined vectors used to store tensor shapes and inlined hash maps
  • New quantized kernels for weight symmetry to improve performance on ARM64 little core (GEMM and Conv)
  • Specialized kernel to improve performance of quantized Resize by up to 2x speedup
  • Improved the thread job partition for QLinearConv, demonstrating up to ~20% perf gain for certain models
  • Quantization tool: improved ONNX shape inference for large models

Execution Providers

  • TensorRT EP
    • TensorRT 8.4 support
    • Provide option to share execution context memory between TensorRT subgraphs
    • Workaround long CI test time caused by frequent initialization/de-initialization of TensorRT builder
    • Improve subgraph partitioning and consolidate TensorRT subgraphs when possible
    • Refactor engine cache serialization/deserialization logic
    • Miscellaneous bug fixes and performance improvements
  • OpenVINO EP
    • Pre-Built ONNXRuntime binaries with OpenVINO now available on pypi: onnxruntime-openvino
    • Performance optimizations of existing supported models
    • New runtime configuration option ‘enable_dynamic_shapes’ added to enable dynamic shapes for each iteration
    • ORTModule included as part of OVEP Python Package to enable Torch ORT Inference
  • DirectML EP
  • TVM EP - details
    • Updated to add model .dll ingestion and execution on Windows
    • Updated documentation and CI tests
  • [New] SNPE EP - details
  • [Preview] XNNPACK EP - initial infrastructure with limited operator support, for use with ORT Mobile and ORT Web
    • Currently supports Conv and MaxPool, with work in progress to add more kernels

Mobile

  • Binary size reductions in Android minimal build - 12% reduction in size of base build with no operator kernels
  • Added new operator support to NNAPI and CoreML EPs to improve ability to run super resolution and BERT models using NPU
    • NNAPI: DepthToSpace, PRelu, Gather, Unsqueeze, Pad
    • CoreML: DepthToSpace, PRelu
  • Added Docker file to simplify running a custom minimal build to create an ORT Android package
  • Initial XNNPACK EP compatibility

Web

  • Memory usage optimizations
  • Initial XNNPACK EP compatibility

ORT Training

  • [New] ORT Training acceleration is also natively available through HuggingFace Optimum
  • [New] FusedAdam Optimizer now available through the torch-ort package for easier training integration
  • FP16_Optimizer Support for more DeepSpeed Versions
  • Bfloat16 support for AtenOp
  • Added gradient ops for ReduceMax and ReduceMin
  • Updates to Min and Max grad ops to use distributed logic
  • Optimizations
    • Optimized perf for Gelu and GeluGrad kernels for mixed precision models
    • Enabled fusions for SimplifiedLayerNorm
    • Added bitmask versions of Dropout, BiasDropout and DropoutGrad which brings ~8x space savings for the mast output.

Known issues


Contributions

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
snnn, edgchen1, fdwr, skottmckay, iK1D, fs-eire, mszhanyi, WilBrady, justinchuby, tianleiwu, PeixuanZuo, garymm, yufenglee, adrianlizarraga, yuslepukhin, dependabot[bot], chilo-ms, vvchernov, oliviajain, ytaous, hariharans29, sumitsays, wangyems, pengwa, baijumeswani, smk2007, RandySheriffH, gramalingam, xadupre, yihonglyu, zhangyaobit, YUNQIUGUO, jcwchen, chenfucn, souptc, chandru-r, jstoecker, hanbitmyths, RyanUnderhill, georgen117, jywu-msft, mindest, sfatimar, HectorSVC, Craigacp, jeffdaily, zhijxu-MS, natke, stevenlix, jeffbloo, guoyu-wang, daquexian, faxu, jingyanwangms, adtsai, wschin, weixingzhang, wenbingl, MaajidKhan, ashbhandare, ajindal1, zhanghuanrong, tiagoshibata, askhade, liqunfu