REServe: Reliable and Efficient Large Language Models Serving System

All

7 repositories

vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•4.1k•0•0•0•Updated Aug 12, 2024Aug 12, 2024
ServerlessLLM
Public
Cost-efficient and fast multi-LLM serving.
Python
•
Apache License 2.0
•20•0•0•0•Updated Jul 31, 2024Jul 31, 2024
core
Public
Core components for REServe
Python
•0•0•0•0•Updated Jul 29, 2024Jul 29, 2024
Initializer
Public
Initializer for KServe Cluster
Shell
•
Apache License 2.0
•0•1•0•0•Updated Jul 29, 2024Jul 29, 2024
tensorrtllm_backend
Public
The Triton TensorRT-LLM Backend
Python
•
Apache License 2.0
•97•0•0•0•Updated Jul 29, 2024Jul 29, 2024
TensorRT-LLM
Public
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
C++
•
Apache License 2.0
•940•0•0•0•Updated Jul 9, 2024Jul 9, 2024
kserve
Public
Standardized Serverless ML Inference Platform on Kubernetes
Python
•
Apache License 2.0
•1k•0•0•0•Updated Jul 4, 2024Jul 4, 2024