baotonglu

Follow

🐈‍⬛

Focusing

Baotong Lu baotonglu

🐈‍⬛

Focusing

Follow

"Simple systems work and complex don’t."

100 followers · 56 following

Microsoft Research
Beijing
https://baotonglu.github.io/

Achievements

Achievements

Highlights

Pro

Organizations

Lists (3)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,778 180 Updated Jan 9, 2025

NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 834 55 Updated Dec 16, 2024

sfu-dis / preemptdb

Low-Latency Transaction Scheduling via Userspace Interrupts: Why Wait or Yield When You Can Preempt? (SIGMOD 2025)

C++ 28 1 Updated Dec 5, 2024

October2001 / Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

252 5 Updated Jan 8, 2025

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 866 40 Updated Dec 28, 2024

microsoft / LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,805 267 Updated Dec 16, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,365 134 Updated Jan 14, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 5,997 1,037 Updated Jan 10, 2025

nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors

C++ 4,487 672 Updated Aug 11, 2024

matchyc / RoarGraph

VLDB 2024 paper repo. RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

C++ 35 8 Updated Sep 16, 2024

jeffhammond / STREAM

STREAM benchmark

C 359 140 Updated Apr 12, 2024

google-research / bigbird

Transformers for Longer Sequences

Python 585 104 Updated Sep 1, 2022

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,729 5,156 Updated Jan 14, 2025

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

C++ 32,363 3,700 Updated Jan 14, 2025

TsinghuaDatabaseGroup / AIDB

ai4db and db4ai work

734 90 Updated Dec 26, 2024

openmlsys / openmlsys-zh

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,179 440 Updated Apr 13, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 70,690 10,218 Updated Jan 14, 2025

PKUFlyingPig / CMU10-714

Learning material for CMU10-714: Deep Learning System

Jupyter Notebook 229 37 Updated May 12, 2024

microsoft / ML-For-Beginners

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

HTML 70,693 14,860 Updated Jan 1, 2025

utsaslab / dinomo

DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (PVLDB 2022, VLDB 2023)

Python 36 3 Updated Apr 21, 2023

sfu-dis / mosaicdb

The Art of Latency Hiding in Modern Database Engines (VLDB 2024)

C++ 47 3 Updated Oct 2, 2024

LPD-EPFL / CLHT

CLHT is a very fast and scalable (lock-based and lock-free) concurrent hash table with cache-line sized buckets.

C 152 23 Updated Oct 4, 2021

efficient / mica

MICA: A Fast In-memory Key-Value Store (see isca2015 branch for the ISCA2015 version)

C 206 49 Updated Jan 18, 2016

flamegraph-rs / flamegraph

Easy flamegraphs for Rust projects and everything else, without Perl or pipes <3

Rust 4,908 153 Updated Jan 6, 2025

fallfish / sepbit

Java 27 7 Updated Jan 17, 2022

fallfish / AustereCache

C++ 15 7 Updated Jun 11, 2023

derekmolloy / exploringBB

Source code for the book Exploring BeagleBone, by Derek Molloy (see www.exploringbeaglebone.com)

C++ 469 442 Updated Sep 12, 2020

sfu-dis / ssd-vs-pm

Cost/performance analysis of index structures on SSD and persistent memory (CIDR 2022)

C++ 36 1 Updated Jun 23, 2022

wuxb45 / wormhole

Wormhole: A concurrent ordered in-memory key-value index with O(log L) search cost (L is search key's length)

C 74 21 Updated Apr 29, 2022

wuxb45 / SLB

Search Lookaside Buffer

C 4 2 Updated Apr 28, 2022