Skip to content

Release v0.2.0

Compare
Choose a tag to compare
@Ying1123 Ying1123 released this 25 Jul 15:58
· 735 commits to main since this release
1a491d0

Highlights

  • We performed extensive engineering to improve the base performance. Compared to TensorRT-LLM and vLLM, SGLang now consistently delivers superior or competitive performance in both online and offline scenarios, handling models from Llama-8B to Llama-405B, on A100 and H100 GPUs, using FP8 and FP16. See the latest blog.
  • New models: Llama3 405B, Deepseek MoE, InternLM, GPTBigCode, Mistral-Nemo

What's Changed

New Contributors

Full Changelog: v0.1.20...v0.2.0