Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry Regarding Prefix Caching in the vllm Baseline #2

Open
Lin-Qingyang-Alec opened this issue Jul 27, 2024 · 1 comment
Open

Inquiry Regarding Prefix Caching in the vllm Baseline #2

Lin-Qingyang-Alec opened this issue Jul 27, 2024 · 1 comment

Comments

@Lin-Qingyang-Alec
Copy link

I hope this message finds you well. I recently read your technical report, and I found it very insightful. Thank you for sharing your work!

While reviewing the experimental details, I noticed some aspects related to the baseline were not entirely clear. Specifically, you mentioned using the vllm service as a baseline for Mooncake. I observed that Mooncake utilizes prefix caching technology, and I noticed that vllm has a startup parameter: --enable-prefix-caching, which seems to serve a similar purpose.

Could you kindly clarify whether you enabled this feature in the baseline during your experiments?

Thank you for your time, and I appreciate any insights you can provide.

Best regards.

@james0zan
Copy link
Member

The support of prefix cache in vLLM is later than our experiment thus we do not compare with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants