Inquiry Regarding Prefix Caching in the vllm Baseline #2

Lin-Qingyang-Alec · 2024-07-27T02:11:26Z

I hope this message finds you well. I recently read your technical report, and I found it very insightful. Thank you for sharing your work!

While reviewing the experimental details, I noticed some aspects related to the baseline were not entirely clear. Specifically, you mentioned using the vllm service as a baseline for Mooncake. I observed that Mooncake utilizes prefix caching technology, and I noticed that vllm has a startup parameter: --enable-prefix-caching, which seems to serve a similar purpose.

Could you kindly clarify whether you enabled this feature in the baseline during your experiments?

Thank you for your time, and I appreciate any insights you can provide.

Best regards.

james0zan · 2024-08-01T01:53:03Z

The support of prefix cache in vLLM is later than our experiment thus we do not compare with it.

reorganize transfer engine to be more structured

alogfans added a commit that referenced this issue Nov 28, 2024

Merge pull request #2 from kvcache-ai/rf_dev

abda1b5

reorganize transfer engine to be more structured

chestnut-Q mentioned this issue Dec 28, 2024

Inquiry Regarding "Mooncake- A KVCache-centric Disaggregated Architecture for LLM Serving" #50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry Regarding Prefix Caching in the vllm Baseline #2

Inquiry Regarding Prefix Caching in the vllm Baseline #2

Lin-Qingyang-Alec commented Jul 27, 2024

james0zan commented Aug 1, 2024

Inquiry Regarding Prefix Caching in the vllm Baseline #2

Inquiry Regarding Prefix Caching in the vllm Baseline #2

Comments

Lin-Qingyang-Alec commented Jul 27, 2024

james0zan commented Aug 1, 2024