Skip to content

Upcoming Release Roadmap

Faith Xu edited this page Nov 21, 2022 · 8 revisions

ORT 1.14

Target Release: Early February 2023

Core Runtime

  • ONNX 1.13 / Opset 18 support
  • Performance improvements to GRU and Slice operator for real-time audio scenarios
  • Make threadpool NUMA aware
  • Multi-stream EP refactoring

Builds & Packages

  • Convert git submodules to cmake FetchContent
  • Building from source will require cmake version >=3.24 instead of >=3.18
  • Improved source code release in Github release page, including git submodules
  • XNNPACK in Android/iOS mobile packages
  • Onnxruntime-extensions packages for mobile and web
  • ORT Training Nuget packages: CPU & GPU

Performance

  • Add support of quantization on machines with AMX (i.e.,Rapid Sapphire)
  • Enhanced performance of transformer models (including Stable Diffusion) on GPU, including improvement of Encoder&Decoder and generation method: beam search, greedy search and top p sampling

Execution Providers

  • [new] Azure EP (Preview) - supports AzureML hosted models using Triton
  • [new] Proxy EP (Preview) - enables custom code to be used as an EP using ORT custom op APIs
  • TensorRT 8.5
  • FasterTransformer library integration
  • ROCm EP GA (5.4)

Mobile

  • Pre/Post processing
    • Support updating mobilenet and super resolution models including usage of custom ops for conversion to/from jpg/png
    • onnxruntime-extensions packages for Android and iOS with required custom ops
    • Updated samples repo to demonstrate end-to-end usage with onnxruntime-extensions package
  • XNNPACK
    • More common operators
    • iOS build support
    • Support for using ORT allocator in XNNPACK kernels
    • Support for usage in minimal build

Web

  • onnxruntime-extensions included in default build with NLP ops
  • XNNPACK GEMM
  • Improved exception handling
  • Utility functions (i.e. image2tensor, tensor2image)

Training

  • On-Device Training support for CustomNLP, available in new Nuget package
  • Optimizations and bug fixes for Hugging Face models
  • Stable diffusion optimizations
  • Expose FP16 optimizer in torch-ort