14 Feb 18:07

97d44f6

v0.6.0 Latest

Latest

Release Notes

We are excited to announce the release of onnxruntime-genai version 0.6.0. Below are the key updates included in this release:

Support for contextual or continuous decoding allows users to carry out multi-turn conversation style generation.
Support for new models such as Deepseek R1, AMD OLMo, IBM Granite and others.
Python 3.13 wheels have been introduced
Support for generation for models sourced from Qualcomm's AI Hub. This work also includes publishing a nuget package Microsoft.ML.OnnxRuntimeGenAI.QNN for QNN EP
Support for WebGPU EP

This release also includes performance improvements to optimize memory usage and speed. In addition, there are several bug fixes that resolve issues reported by users.

Assets 13

onnxruntime-genai-0.6.0-linux-x64-cuda.tar.gz

14.5 MB 2025-02-14T18:07:14Z
onnxruntime-genai-0.6.0-linux-x64.tar.gz

1.41 MB 2025-02-14T18:07:15Z
onnxruntime-genai-0.6.0-osx-arm64.tar.gz

781 KB 2025-02-14T18:07:14Z
onnxruntime-genai-0.6.0-osx-x64.tar.gz

881 KB 2025-02-14T18:07:13Z
onnxruntime-genai-0.6.0-win-arm64-dml.zip

733 KB 2025-02-14T18:11:31Z
onnxruntime-genai-0.6.0-win-arm64.zip

690 KB 2025-02-14T18:07:13Z
onnxruntime-genai-0.6.0-win-x64-cuda.zip

13.7 MB 2025-02-14T18:07:11Z
onnxruntime-genai-0.6.0-win-x64-dml.zip

755 KB 2025-02-14T18:07:17Z
onnxruntime-genai-0.6.0-win-x64.zip

712 KB 2025-02-14T18:07:13Z
onnxruntime-genai-android-0.6.0.aar

2.86 MB 2025-02-14T18:07:17Z
Source code (zip)

2025-02-13T22:58:21Z
Source code (tar.gz)

2025-02-13T22:58:21Z

0 Join discussion

26 Nov 18:05

ajindal1

v0.5.2

27bcf6c

v0.5.2

Release Notes

Patch release 0.5.2 adds:

Fixes for bugs #1074, #1092 via PRs #1065 and #1070
Fix Nuget sample in package README to show correct disposal of objects
Added extra validation via PRs #1050 #1066

Features in 0.5.0:

Support for MultiLoRA
Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
Support for the Phi-3 MoE model
Support for NVIDIA Nemotron model
Support for the Qwen model
Addition of the Set Terminate feature, which allows users to cancel mid-generation
Soft capping support for Group Query Attention
Extend quantization support to embedding and LM head layers
Mac support in published packages

Known issues

Models running with DirectML do not support batching
Python 3.13 is not supported in this release

Assets 11

13 Nov 21:26

RyanUnderhill

v0.5.1

e8cd6bc

v0.5.1

Release Notes

In addition to the features in the 0.5.0 release, this release adds:

Add ability to choose provider and modify options at runtime
Fixed data leakage bug with KV caches

Features in 0.5.0:

Support for MultiLoRA
Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
Support for the Phi-3 MoE model
Support for NVIDIA Nemotron model
Support for the Qwen model
Addition of the Set Terminate feature, which allows users to cancel mid-generation
Soft capping support for Group Query Attention
Extend quantization support to embedding and LM head layers
Mac support in published packages

Known issues

Models running with DirectML do not support batching
Python 3.13 is not supported in this release

Assets 11

08 Nov 19:43

aciddelgado

v0.5.0

826f6aa

v0.5.0

Release Notes

Support for MultiLoRA
Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
Support for the Phi-3 MoE model
Support for NVIDIA Nemotron model
Support for the Qwen model
Addition of the Set Terminate feature, which allows users to cancel mid-generation
Soft capping support for Group Query Attention
Extend quantization support to embedding and LM head layers
Mac support in published packages

Known issues

Models running with DirectML do not support batching
Python 3.13 is not supported in this release

Assets 11

22 Aug 20:26

ajindal1

v0.4.0

b77e768

v0.4.0

Release Notes

Support for new models such as Qwen 2, LLaMA 3.1, Gemma 2, Phi-3 small on CPU
Support to build already-quantized models that were quantized with AWQ or GPTQ
Performance improvements for Intel and Arm CPU
Packing and language binding
- Added Java bindings (build from source)
- Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
- Publish packages for Win Arm
- Support for Android (build from source)

Assets 9

21 Jun 21:23

baijumeswani

v0.3.0

964eb65

v0.3.0

Release Notes

Phi-3 Vision model support for DML EP.
Addressed DML memory leak issue and crashes on long prompts.
Addressed crashes and slowness on CPU EP GQA on long prompts due to integer overflow issues.
Added the import lib for windows C API package.
Addressed a bug with get_output('logits') so that it returns the logits for the entire prompt and not for the last generated token.
Addressed a bug with querying the device type of the model so that it won't crash.
Added NetStandard 2.0 compatibility.

Assets 9

30 May 17:24

baijumeswani

v0.3.0-rc2

d536387

ONNX Runtime GenAI v0.3.0-rc2 Pre-release

Pre-release

Release Notes

Added support for the Phi-3-Vision model.
Added support for the Phi-3-Small model.
Removed usage of std::filesystem to avoid runtime issues when loading incompatible symbols from stdc++ and stdc++fs.

Assets 7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Notes

Release Notes

Known issues

Release Notes

Known issues

Release Notes

Known issues

Release Notes

Release Notes

Release Notes

Releases: microsoft/onnxruntime-genai

v0.6.0

Release Notes

v0.5.2

Release Notes

Known issues

v0.5.1

Release Notes

Known issues

v0.5.0

Release Notes

Known issues

v0.4.0

Release Notes

v0.3.0

Release Notes

ONNX Runtime GenAI v0.3.0-rc2

Release Notes