Skip to content
View baotonglu's full-sized avatar
🐈‍⬛
Focusing
🐈‍⬛
Focusing

Highlights

  • Pro

Organizations

@sfu-dis

Block or report baotonglu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Cuda 1,778 180 Updated Jan 9, 2025

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 834 55 Updated Dec 16, 2024

Low-Latency Transaction Scheduling via Userspace Interrupts: Why Wait or Yield When You Can Preempt? (SIGMOD 2025)

C++ 28 1 Updated Dec 5, 2024

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

252 5 Updated Jan 8, 2025

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 866 40 Updated Dec 28, 2024

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,805 267 Updated Dec 16, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,365 134 Updated Jan 14, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 5,997 1,037 Updated Jan 10, 2025

Header-only C++/python library for fast approximate nearest neighbors

C++ 4,487 672 Updated Aug 11, 2024

VLDB 2024 paper repo. RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

C++ 35 8 Updated Sep 16, 2024

STREAM benchmark

C 359 140 Updated Apr 12, 2024

Transformers for Longer Sequences

Python 585 104 Updated Sep 1, 2022

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,729 5,156 Updated Jan 14, 2025

A library for efficient similarity search and clustering of dense vectors.

C++ 32,363 3,700 Updated Jan 14, 2025

ai4db and db4ai work

734 90 Updated Dec 26, 2024

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,179 440 Updated Apr 13, 2024

LLM inference in C/C++

C++ 70,690 10,218 Updated Jan 14, 2025

Learning material for CMU10-714: Deep Learning System

Jupyter Notebook 229 37 Updated May 12, 2024

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

HTML 70,693 14,860 Updated Jan 1, 2025

DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (PVLDB 2022, VLDB 2023)

Python 36 3 Updated Apr 21, 2023

The Art of Latency Hiding in Modern Database Engines (VLDB 2024)

C++ 47 3 Updated Oct 2, 2024

CLHT is a very fast and scalable (lock-based and lock-free) concurrent hash table with cache-line sized buckets.

C 152 23 Updated Oct 4, 2021

MICA: A Fast In-memory Key-Value Store (see isca2015 branch for the ISCA2015 version)

C 206 49 Updated Jan 18, 2016

Easy flamegraphs for Rust projects and everything else, without Perl or pipes <3

Rust 4,908 153 Updated Jan 6, 2025
Java 27 7 Updated Jan 17, 2022
C++ 15 7 Updated Jun 11, 2023

Source code for the book Exploring BeagleBone, by Derek Molloy (see www.exploringbeaglebone.com)

C++ 469 442 Updated Sep 12, 2020

Cost/performance analysis of index structures on SSD and persistent memory (CIDR 2022)

C++ 36 1 Updated Jun 23, 2022

Wormhole: A concurrent ordered in-memory key-value index with O(log L) search cost (L is search key's length)

C 74 21 Updated Apr 29, 2022

Search Lookaside Buffer

C 4 2 Updated Apr 28, 2022
Next
Showing results