Skip to content

Release v0.1.20

Compare
Choose a tag to compare
@merrymercy merrymercy released this 14 Jul 00:33
· 839 commits to main since this release
5d264a9

Highlights

  • Enable CUDA graph by default. It brings 1.5x - 2x speedup for small batch size decoding (#612)
  • Model support: Gemma2, minicpm, Qwen2 MoE
  • Docker support (#217 )
  • Various latency optimizations

What's Changed

New Contributors

Full Changelog: v0.1.18...v0.1.20