Highlights
- Pro
Popular repositories Loading
-
vllm
vllm PublicForked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
-
sglang
sglang PublicForked from sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Python
-
flashinfer
flashinfer PublicForked from flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Cuda
-
opencompass
opencompass PublicForked from open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Python
-
evalscope
evalscope PublicForked from modelscope/evalscope
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Python
-
omniserve
omniserve PublicForked from mit-han-lab/omniserve
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
C++
If the problem persists, check the GitHub status page or contact support.