Skip to content

v0.5.0

Compare
Choose a tag to compare
@aciddelgado aciddelgado released this 08 Nov 19:43
826f6aa

Release Notes

  • Support for MultiLoRA
  • Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
  • Support for the Phi-3 MoE model
  • Support for NVIDIA Nemotron model
  • Support for the Qwen model
  • Addition of the Set Terminate feature, which allows users to cancel mid-generation
  • Soft capping support for Group Query Attention
  • Extend quantization support to embedding and LM head layers
  • Mac support in published packages

Known issues

  • Models running with DirectML do not support batching
  • Python 3.13 is not supported in this release