ONNX Runtime v1.16.2
The patch release includes updates on:
- Performance optimizations for Llama2 on CUDA EP and DirectML EP
- Performance optimizations for Stable Diffusion XL model for CUDA EP
- Demos for text to image generation
- Mobile bug fixes for crash on some older 64-bit ARM devices and AOT inlining issue on iOS with C# bindings
- TensorRT EP bug fixes for user provided compute stream and stream synchronization