Release Notes
We are excited to announce the release of onnxruntime-genai
version 0.6.0. Below are the key updates included in this release:
- Support for contextual or continuous decoding allows users to carry out multi-turn conversation style generation.
- Support for new models such as Deepseek R1, AMD OLMo, IBM Granite and others.
- Python 3.13 wheels have been introduced
- Support for generation for models sourced from Qualcomm's AI Hub. This work also includes publishing a nuget package
Microsoft.ML.OnnxRuntimeGenAI.QNN
for QNN EP - Support for WebGPU EP
This release also includes performance improvements to optimize memory usage and speed. In addition, there are several bug fixes that resolve issues reported by users.