-
Tsinghua University, NVIDIA
- Beijing, China
Highlights
- Pro
Pinned Loading
-
thu-ml/SageAttention
thu-ml/SageAttention PublicQuantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
-
SPH_Project
SPH_Project PublicSPH Realization of Fluid Simulation. Featuring Large Scale Simulation, Rigid-Fluid Coupling and High Viscosity Fluid.
-
mit-han-lab/llm-awq
mit-han-lab/llm-awq Public[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
-
thu-nics/MoA
thu-nics/MoA PublicThe official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
-
mit-han-lab/omniserve
mit-han-lab/omniserve Public[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
If the problem persists, check the GitHub status page or contact support.