[Bug] difference of kv-cache-prefixing between vLLM and sglang #1669
Closed
chenchunhui97
started this conversation in
General
Replies: 1 comment
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checklist
Describe the bug
no bug. I am just wondering the difference of kv-cache-prefixing between vLLM impletention and SGLang implementation.
vLLM use hash to store and verify cached token:
SGLang uses RadixAttention, so what is the difference? I found SGLang is faster than vLLM, why SGLang RadixAttention is faster than vLLM KV-Cache-prefixing?
Reproduction
not available
Environment
//
Beta Was this translation helpful? Give feedback.
All reactions