Upcoming Release Roadmap

Jump to bottom

Faith Xu edited this page Nov 21, 2022 · 8 revisions

ORT 1.14

Target Release: Early February 2023

Core Runtime

ONNX 1.13 / Opset 18 support
Performance improvements to GRU and Slice operator for real-time audio scenarios
Make threadpool NUMA aware
Multi-stream EP refactoring

Builds & Packages

Convert git submodules to cmake FetchContent
Building from source will require cmake version >=3.24 instead of >=3.18
Improved source code release in Github release page, including git submodules
XNNPACK in Android/iOS mobile packages
Onnxruntime-extensions packages for mobile and web
ORT Training Nuget packages: CPU & GPU

Performance

Add support of quantization on machines with AMX (i.e.,Rapid Sapphire)
Enhanced performance of transformer models (including Stable Diffusion) on GPU, including improvement of Encoder&Decoder and generation method: beam search, greedy search and top p sampling

Execution Providers

[new] Azure EP (Preview) - supports AzureML hosted models using Triton
[new] Proxy EP (Preview) - enables custom code to be used as an EP using ORT custom op APIs
TensorRT 8.5
FasterTransformer library integration
ROCm EP GA (5.4)

Mobile

Pre/Post processing
- Support updating mobilenet and super resolution models including usage of custom ops for conversion to/from jpg/png
- onnxruntime-extensions packages for Android and iOS with required custom ops
- Updated samples repo to demonstrate end-to-end usage with onnxruntime-extensions package
XNNPACK
- More common operators
- iOS build support
- Support for using ORT allocator in XNNPACK kernels
- Support for usage in minimal build

Web

onnxruntime-extensions included in default build with NLP ops
XNNPACK GEMM
Improved exception handling
Utility functions (i.e. image2tensor, tensor2image)

Training

On-Device Training support for CustomNLP, available in new Nuget package
Optimizations and bug fixes for Hugging Face models
Stable diffusion optimizations
Expose FP16 optimizer in torch-ort