Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How does VLLM use MInference through OpenAI Compatible Server? #40

Open
jueming0312 opened this issue Jul 15, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@jueming0312
Copy link

Describe the issue

Can I run "python -m vllm.entrypoints.openai.api_server" to load MInference capabilities in VLLM?

@jueming0312 jueming0312 added the question Further information is requested label Jul 15, 2024
@iofu728 iofu728 self-assigned this Jul 16, 2024
@iofu728
Copy link
Contributor

iofu728 commented Jul 16, 2024

Hi @jueming0312, thanks for your interest in MInference.

MInference is a method to accelerate self-deployed LLM inference in long-context scenarios. It does not support acceleration for API-based LLMs.

@kalocide
Copy link

I'm seconding this: vLLM is a self-deployed LLM inference engine, but it does support model serving over an OpenAI-compatible API, which is what @jueming0312 is asking about. If this is a goal of the project, I would suggest publishing a package that bundles the vLLM server code with the MInference patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants