Release Notes
Patch release 0.5.2 adds:
- Fixes for bugs #1074, #1092 via PRs #1065 and #1070
- Fix Nuget sample in package README to show correct disposal of objects
- Added extra validation via PRs #1050 #1066
Features in 0.5.0:
- Support for MultiLoRA
- Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
- Support for the Phi-3 MoE model
- Support for NVIDIA Nemotron model
- Support for the Qwen model
- Addition of the Set Terminate feature, which allows users to cancel mid-generation
- Soft capping support for Group Query Attention
- Extend quantization support to embedding and LM head layers
- Mac support in published packages
Known issues
- Models running with DirectML do not support batching
- Python 3.13 is not supported in this release