[Bug] difference of kv-cache-prefixing between vLLM and sglang #1669

chenchunhui97 · 2024-10-14T06:16:19Z

chenchunhui97
Oct 14, 2024

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

no bug. I am just wondering the difference of kv-cache-prefixing between vLLM impletention and SGLang implementation.

vLLM use hash to store and verify cached token:

SGLang uses RadixAttention, so what is the difference? I found SGLang is faster than vLLM, why SGLang RadixAttention is faster than vLLM KV-Cache-prefixing?

Reproduction

not available

Environment

//

zhyncs · 2024-10-18T06:02:36Z

zhyncs
Oct 18, 2024
Maintainer

vllm-project/vllm#2614

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] difference of kv-cache-prefixing between vLLM and sglang #1669

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

[Bug] difference of kv-cache-prefixing between vLLM and sglang #1669

chenchunhui97 Oct 14, 2024

Checklist

Describe the bug

Reproduction

Environment

Replies: 1 comment

zhyncs Oct 18, 2024 Maintainer

chenchunhui97
Oct 14, 2024

zhyncs
Oct 18, 2024
Maintainer