-
Microsoft Research
- Beijing
- https://baotonglu.github.io/
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
FlashInfer: Kernel Library for LLM Serving
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Low-Latency Transaction Scheduling via Userspace Interrupts: Why Wait or Yield When You Can Preempt? (SIGMOD 2025)
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Header-only C++/python library for fast approximate nearest neighbors
VLDB 2024 paper repo. RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search
A high-throughput and memory-efficient inference and serving engine for LLMs
A library for efficient similarity search and clustering of dense vectors.
《Machine Learning Systems: Design and Implementation》- Chinese Version
Learning material for CMU10-714: Deep Learning System
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (PVLDB 2022, VLDB 2023)
The Art of Latency Hiding in Modern Database Engines (VLDB 2024)
CLHT is a very fast and scalable (lock-based and lock-free) concurrent hash table with cache-line sized buckets.
MICA: A Fast In-memory Key-Value Store (see isca2015 branch for the ISCA2015 version)
Easy flamegraphs for Rust projects and everything else, without Perl or pipes <3
Source code for the book Exploring BeagleBone, by Derek Molloy (see www.exploringbeaglebone.com)
Cost/performance analysis of index structures on SSD and persistent memory (CIDR 2022)
Wormhole: A concurrent ordered in-memory key-value index with O(log L) search cost (L is search key's length)