add VLLM support? #1010

cboettig · 2024-09-22T04:38:50Z

Problem/Solution

It has been great to see Ollama added as a first-class option in #646, this has made it easy to access a huge variety of models and has been working very well us.

I increasingly see groups and university providers using VLLM for this as well. I'm out of my depth but I understand VLLM is considered better suited when a group is serving a local model to multiple users (e.g. from a local GPU cluster, rather than everyone running an independent Ollama). It gets passing mention in some threads here as well. I think supporting more providers is all to the good and would love to see support for this as a backend similar to the existing Ollama support, though maybe I'm not understanding the details and that is unnecessary? (i.e. it looks like it might be possible to simply use the OpenAI configuration with alternative endpoint to access a VLLM server?)

cboettig · 2024-10-03T03:48:06Z

It looks like the team at the National Research Platform has a nice work-around for this at the moment using LiteLLM via it's OpenAI-compatible API (https://docs.litellm.ai/docs/proxy/user_keys) This works, though isn't really the same as direct VLLM support, but thought it was worth mentioning.

flefevre · 2024-12-20T22:54:28Z

Which environnemental variables do I have to set-up in order to use my own litellm instance, such as https://litellm.mylaboratory.gov/
Thanks for your help

utsumi-fj · 2025-01-27T07:54:30Z

@flefevre LiteLLM seems to support the following routes:

/openai/deployments/{model}/chat/completions
/openai/deployments/{model}/embeddings

I could use Language model (Completion model) and Embedding model by specifying the LiteLLM server URL as follows:

Language model

Completion model: OpenAI :: gpt-4o (This is a dummy model and different from the actual model.)
Base API URL: http://{ LiteLLM server }/openai/deployments/{ actual model name }

Embedding model

Embedding mode: OpenAI :: text-embedding-3-large (This is a dummy model and different from the actual model.)
Base API URL: http://{ LiteLLM server }/openai/deployments/{ actual model name }

The actual model name can be shown by /models or /v1/models API.

srdas · 2025-02-04T22:29:59Z

@cboettig Here is how you can use Jupyter AI to run models from a vLLM server using openrouter.

Install vLLM on the server. (I installed it locally on my Mac, which requires installation from source for Apple silicon in its own python environment.) For instructions, see: https://docs.vllm.ai/en/latest/getting_started/installation/index.html

Serve up the vLLM model you want from the environment:

You will see the server running on port 8000:

To check which models are running enter this in your browser and verify the model ids:

Start up Jupyter AI and update the AI Settings as follows (notice that we are using OpenRouter as the provider, which is a unified interface for LLMs based on OpenAI's API interface):

You can test that the model is working by running the following in a JupyterLab notebook:

Then, try the chat in Jupyter AI to see that it works:

I will update the documentation for the usage of vLLM through a separate PR.

zonca · 2025-02-05T00:56:02Z

thanks @srdas, this is extremely useful,
I got this to work with the Jetstream inference service https://docs.jetstream-cloud.org/general/inference-service,
however, I had to hack around a "openrouter_api_key keyword not accepted error", see details here:

https://www.zonca.dev/posts/2025-02-04-jetstream-llm-service-deepseek-jupyterai#patch-for-openrouter_api_key-keyword-not-accepted-error

Do you have a better workaround for my issue?

srdas · 2025-02-05T06:24:41Z

@zonca I think this may have been fixed by https://github.com/jupyterlab/jupyter-ai/pull/1219/files, which has been merged, so it will be available in the next release, which should come shortly. I believe it will address the issue.

dlqqq · 2025-02-06T18:37:12Z

@zonca Thanks for documenting that issue with Jupyter AI in a blog post! I'm releasing the v2.29.1 patch release right now, which includes PR #1219 to patch this bug. Can you try out Jupyter AI v2.29.1 in a new Python environment, and see if you still encounter this issue?

zonca · 2025-02-06T18:48:26Z

@srdas @dlqqq much appreciated thanks! I confirm this is fixed in 2.29.1, I'll also update my blog post.

dlqqq · 2025-02-06T19:06:24Z

@zonca Thanks for the quick reply! Really glad to hear. Our team will close this issue once we add documentation for vLLM. 🎉

srdas · 2025-02-06T19:11:54Z

@zonca Thank you so much for verifying that the fix works. That was so quick! Appreciate your maintaining a blog to help others who are using Jupyter AI.

cboettig · 2025-02-11T16:48:44Z

@srdas This is great, thanks much.

I am not seeing openrouter as an option for embedding model though?

srdas · 2025-02-11T17:13:57Z

@cboettig Thanks for the question. I had looked into this when looking into vLLM. I may be wrong, but it seems as of now, OpenRouter does not support embedding models. There is some discussion here on it: https://community.n8n.io/t/how-to-use-embedding-models-with-openrouter/74678.

See also: n4ze3m/dialoqbase#217.

Happy to look into this if you have a workaround for embeddings, or maybe add it to openrouter.py and open a PR.

cboettig · 2025-02-11T17:24:21Z

@srdas is there a reason we can't just use the openai API and allow a Custom URL and Custom model name? This is how we access all our vllm-hosted models in all the other interfaces and it works perfectly.

For instance, I can use the juypter code-server with continue.dev plugin with my students to easily configure all our vllm models and toggle between configured models.

dlqqq · 2025-02-11T17:31:38Z

@cboettig That would also work! The issue is that we don't allow custom model IDs for OpenAI, since we only show the model IDs available from OpenAI. Also, ideally, a user would use "OpenRouter" (not "OpenAI") to access embedding models as well as chat models on vLLM; having 2 different providers for the same scenario would be confusing.

We can add an OpenRouter embedding provider that allows for custom fields. I'll open a new issue for tracking OpenRouter support. 👍

cboettig · 2025-02-11T17:40:53Z

@dlqqq yup, 💯 that is this issue -- but why? I just don't quite understand why jupyter-ai decides not to allow custom model ids (and custom urls) for the openai interface. Every other library or package I work with supports this (langchain, openai's own packages, continue.dev, litellm, vllm, ellmer etc). Why it is necessary for Jupyter-AI to hardware the openai interface to use only OpenAI's model IDs? Even the packages built by openai itself let us override the model names and base url. Everywhere else, 'openai' just means the API protocol.

JupyterAI is a wonderful tool and I would love to use the more open, community led option here instead of the code-server extensions.

cboettig added the enhancement New feature or request label Sep 22, 2024

dlqqq assigned srdas Feb 3, 2025

srdas mentioned this issue Feb 5, 2025

Add documentation for vLLM usage #1232

Merged

dlqqq closed this as completed in #1232 Feb 6, 2025

dlqqq mentioned this issue Feb 11, 2025

Add OpenRouter embedding provider #1240

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add VLLM support? #1010

add VLLM support? #1010

cboettig commented Sep 22, 2024

cboettig commented Oct 3, 2024 •

edited

Loading

flefevre commented Dec 20, 2024

utsumi-fj commented Jan 27, 2025

srdas commented Feb 4, 2025

zonca commented Feb 5, 2025

srdas commented Feb 5, 2025 •

edited

Loading

dlqqq commented Feb 6, 2025 •

edited

Loading

zonca commented Feb 6, 2025

dlqqq commented Feb 6, 2025

srdas commented Feb 6, 2025

cboettig commented Feb 11, 2025

srdas commented Feb 11, 2025

cboettig commented Feb 11, 2025

dlqqq commented Feb 11, 2025

cboettig commented Feb 11, 2025

add VLLM support? #1010

add VLLM support? #1010

Comments

cboettig commented Sep 22, 2024

Problem/Solution

cboettig commented Oct 3, 2024 • edited Loading

flefevre commented Dec 20, 2024

utsumi-fj commented Jan 27, 2025

Language model

Embedding model

srdas commented Feb 4, 2025

zonca commented Feb 5, 2025

srdas commented Feb 5, 2025 • edited Loading

dlqqq commented Feb 6, 2025 • edited Loading

zonca commented Feb 6, 2025

dlqqq commented Feb 6, 2025

srdas commented Feb 6, 2025

cboettig commented Feb 11, 2025

srdas commented Feb 11, 2025

cboettig commented Feb 11, 2025

dlqqq commented Feb 11, 2025

cboettig commented Feb 11, 2025

cboettig commented Oct 3, 2024 •

edited

Loading

srdas commented Feb 5, 2025 •

edited

Loading

dlqqq commented Feb 6, 2025 •

edited

Loading