Skip to content

ONNX Runtime v1.9.0

Compare
Choose a tag to compare
@wangyems wangyems released this 23 Sep 02:05
4daa14b

Announcements

  • GCC version < 7 is no longer supported
  • CMAKE_SYSTEM_PROCESSOR needs be set when cross-compiling on Linux because pytorch cpuinfo was introduced as a dependency for ARM big.LITTLE support. Set it to the value of uname -m output of your target device.

General

  • ONNX 1.10 support
    • opset 15
    • ONNX IR 8 (SparseTensor type, model local functionprotos, Optional type not yet fully supported this release)
  • Improved documentation of C/C++ APIs
  • IBM Power support
  • WinML - DLL dependency fix supports learning models on Windows 8.1
  • Support for sub-building onnxruntime-extensions and statically linking into onnxruntime binary for custom builds
    • Add --_use_extensions option to run models with custom operators implemented in onnxruntime-extensions

APIs

  • Registration of a custom allocator for sharing between multiple sessions. (See RegisterAllocator and UnregisterAllocator APIs in onnxruntime_c_api.h)
  • SessionOptionsAppendExecutionProvider_TensorRT API is deprecated; use SessionOptionsAppendExecutionProvider_TensorRT_V2
  • New APIs: SessionOptionsAppendExecutionProvider_TensorRT_V2, CreateTensorRTProviderOptions, UpdateTensorRTProviderOptions, GetTensorRTProviderOptionsAsString, ReleaseTensorRTProviderOptions, EnableOrtCustomOps, RegisterAllocator, UnregisterAllocator, IsSparseTensor, CreateSparseTensorAsOrtValue, FillSparseTensorCoo, FillSparseTensorCsr, FillSparseTensorBlockSparse, CreateSparseTensorWithValuesAsOrtValue, UseCooIndices, UseCsrIndices, UseBlockSparseIndices, GetSparseTensorFormat, GetSparseTensorValuesTypeAndShape, GetSparseTensorValues, GetSparseTensorIndicesTypeShape, GetSparseTensorIndices,

Performance and quantization

  • Performance improvement on ARM
    • Added S8S8 (signed int8, signed int8) matmul kernel. This avoids extending uin8 to int16 for better performance on ARM64 without dot-product instruction
    • Expanded GEMM udot kernel to 8x8 accumulator
    • Added sgemm and qgemm optimized kernels for ARM64EC
  • Operator improvements
    • Improved performance for quantized operators: DynamicQuantizeLSTM, QLinearAvgPool
    • Added new quantized operator QGemm for quantizing Gemm directly
    • Fused HardSigmoid and Conv
  • Quantization tool - subgraph support
  • Transformers tool improvements
    • Fused Attention for BART encoder and Megatron GPT-2
    • Integrated mixed precision ONNX conversion and parity test for GPT-2
    • Updated graph fusion for embed layer normalization for BERT
    • Improved symbolic shape inference for operators: Attention, EmbedLayerNormalization, Einsum and Reciprocal

Packages

  • Official ORT GPU packages (except Python) now include both CUDA and TensorRT Execution Providers.
    • Python packages will be updated next release. Please note that EPs should be explicitly registered to ensure the correct provider is used.
  • GPU packages are built with CUDA 11.4 and should be compatible with 11.x on systems with the minimum required driver version. See: CUDA minor version compatibility
  • Pypi
    • ORT + DirectML Python packages now available: onnxruntime-directml
    • GPU package can be used on both CPU-only and GPU machines
  • Nuget
    • C#: Added support for using netstandard2.0 as a target framework
    • Windows symbol (PDB) files are no longer included in the Nuget package, reducing size of the binary Nuget package by 85%. To download, please see the artifacts below in Github.

Execution Providers

  • CUDA EP

    • Framework improvements that boost CUDA performance of subgraph heavy models (#8642, #8702)
    • Support for sequence ops for improved performance for models using sequence type
    • Kernel perf improvements for Pad and Upsample (up to 4.5x faster)
  • TensorRT EP

    • Added support for TensorRT 8.0 (x64 Windows/Linux, ARM Jetson), which includes new TensorRT explicit-quantization features (ONNX Q/DQ support)
    • General fixes and quality improvements
  • OpenVINO EP

    • Added support for OpenVINO 2021.4
  • DirectML EP

    • Bug fix for Identity with non-float inputs affecting DynamicQuantizeLinear ONNX backend test

ORT Web

  • WebAssembly
    • SIMD (Single Instruction, Multiple Data) support
    • Option to load WebAssembly from worker thread to avoid blocking main UI thread
    • wasm file path override
  • WebGL
    • Simpler workflow for WebGL kernel implementation
    • Improved performance with Conv kernel enhancement

ORT Mobile

  • Added more example mobile apps
  • CoreML and NNAPI EP enhancements
  • Reduced peak memory usage when initializing session with ORT format model as bytes
  • Enhanced partitioning to improve performance when using NNAPI and CoreML
    • Reduce number of NNAPI/CoreML partitions required
    • Add ability to force usage of CPU for post-processing in SSD models
      • Improves performance by avoiding expensive device copy to/from NPU for cheap post-processing section of the model
  • Changed to using xcframework in the iOS package
    • Supports usage of arm64 iPhone simulator on Mac with Apple silicon

ORT Training

  • Expanding input formats supported to include dictionaries and lists.
  • Enable user defined autograd functions
  • Support for fallback to PyTorch for execution
  • Added support for deterministic compute to enable reproducibility with ORTModule
  • Add DebugOptions and LogLevels to ORTModule API* to improve debuggability
  • Improvements additions to kernels/gradients: Concat, Split, MatMul, ReluGrad, PadOp, Tile, BatchNormInternal
  • Support for ROCm 4.3.1 on AMD GPU

Contributions

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
edgchen1, gwang-msft, tianleiwu, fs-eire, hariharans29, skottmckay, baijumeswani, RyanUnderhill, iK1D, souptc, nkreeger, liqunfu, pengwa, SherlockNoMad, wangyems, chilo-ms, thiagocrepaldi, KeDengMS, suffiank, oliviajain, chenfucn, satyajandhyala, yuslepukhin, pranavsharma, tracysh, yufenglee, hanbitmyths, ytaous, YUNQIUGUO, zhanghuanrong, stevenlix, jywu-msft, chandru-r, duli2012, smk2007, wschin, MaajidKhan, tiagoshibata, xadupre, RandySheriffH, ashbhandare, georgen117, Tixxx, harshithapv, Craigacp, BowenBao, askhade, zhangxiang1993, gramalingam, weixingzhang, natke, tlh20, codemzs, ryanlai2, raviskolli, pranav-prakash, faxu, adtsai, fdwr, wenbingl, jcwchen, neginraoof, cschreib-ibex