[Question]: How does VLLM use MInference through OpenAI Compatible Server? #40

jueming0312 · 2024-07-15T14:25:31Z

Describe the issue

Can I run "python -m vllm.entrypoints.openai.api_server" to load MInference capabilities in VLLM?

iofu728 · 2024-07-16T01:59:30Z

Hi @jueming0312, thanks for your interest in MInference.

MInference is a method to accelerate self-deployed LLM inference in long-context scenarios. It does not support acceleration for API-based LLMs.

kalocide · 2024-07-28T10:58:44Z

I'm seconding this: vLLM is a self-deployed LLM inference engine, but it does support model serving over an OpenAI-compatible API, which is what @jueming0312 is asking about. If this is a goal of the project, I would suggest publishing a package that bundles the vLLM server code with the MInference patch.

jueming0312 added the question Further information is requested label Jul 15, 2024

iofu728 self-assigned this Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: How does VLLM use MInference through OpenAI Compatible Server? #40

[Question]: How does VLLM use MInference through OpenAI Compatible Server? #40

jueming0312 commented Jul 15, 2024

iofu728 commented Jul 16, 2024

kalocide commented Jul 28, 2024

[Question]: How does VLLM use MInference through OpenAI Compatible Server? #40

[Question]: How does VLLM use MInference through OpenAI Compatible Server? #40

Comments

jueming0312 commented Jul 15, 2024

Describe the issue

iofu728 commented Jul 16, 2024

kalocide commented Jul 28, 2024