Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add VLLM support? #1010

Closed
cboettig opened this issue Sep 22, 2024 · 15 comments · Fixed by #1232
Closed

add VLLM support? #1010

cboettig opened this issue Sep 22, 2024 · 15 comments · Fixed by #1232
Assignees
Labels
enhancement New feature or request

Comments

@cboettig
Copy link

Problem/Solution

It has been great to see Ollama added as a first-class option in #646, this has made it easy to access a huge variety of models and has been working very well us.

I increasingly see groups and university providers using VLLM for this as well. I'm out of my depth but I understand VLLM is considered better suited when a group is serving a local model to multiple users (e.g. from a local GPU cluster, rather than everyone running an independent Ollama). It gets passing mention in some threads here as well. I think supporting more providers is all to the good and would love to see support for this as a backend similar to the existing Ollama support, though maybe I'm not understanding the details and that is unnecessary? (i.e. it looks like it might be possible to simply use the OpenAI configuration with alternative endpoint to access a VLLM server?)

@cboettig cboettig added the enhancement New feature or request label Sep 22, 2024
@cboettig
Copy link
Author

cboettig commented Oct 3, 2024

It looks like the team at the National Research Platform has a nice work-around for this at the moment using LiteLLM via it's OpenAI-compatible API (https://docs.litellm.ai/docs/proxy/user_keys) This works, though isn't really the same as direct VLLM support, but thought it was worth mentioning.

@flefevre
Copy link

Which environnemental variables do I have to set-up in order to use my own litellm instance, such as https://litellm.mylaboratory.gov/
Thanks for your help

@utsumi-fj
Copy link

@flefevre LiteLLM seems to support the following routes:

  • /openai/deployments/{model}/chat/completions
  • /openai/deployments/{model}/embeddings

I could use Language model (Completion model) and Embedding model by specifying the LiteLLM server URL as follows:

Language model

  • Completion model: OpenAI :: gpt-4o (This is a dummy model and different from the actual model.)
  • Base API URL: http://{ LiteLLM server }/openai/deployments/{ actual model name }

Embedding model

  • Embedding mode: OpenAI :: text-embedding-3-large (This is a dummy model and different from the actual model.)
  • Base API URL: http://{ LiteLLM server }/openai/deployments/{ actual model name }

The actual model name can be shown by /models or /v1/models API.

Image

@srdas
Copy link
Collaborator

srdas commented Feb 4, 2025

@cboettig Here is how you can use Jupyter AI to run models from a vLLM server using openrouter.

Install vLLM on the server. (I installed it locally on my Mac, which requires installation from source for Apple silicon in its own python environment.) For instructions, see: https://docs.vllm.ai/en/latest/getting_started/installation/index.html

Serve up the vLLM model you want from the environment:

Image

You will see the server running on port 8000:

Image

To check which models are running enter this in your browser and verify the model ids:

Image

Start up Jupyter AI and update the AI Settings as follows (notice that we are using OpenRouter as the provider, which is a unified interface for LLMs based on OpenAI's API interface):

Image

You can test that the model is working by running the following in a JupyterLab notebook:

Image

Then, try the chat in Jupyter AI to see that it works:

Image

I will update the documentation for the usage of vLLM through a separate PR.

@zonca
Copy link

zonca commented Feb 5, 2025

thanks @srdas, this is extremely useful,
I got this to work with the Jetstream inference service https://docs.jetstream-cloud.org/general/inference-service,
however, I had to hack around a "openrouter_api_key keyword not accepted error", see details here:

https://www.zonca.dev/posts/2025-02-04-jetstream-llm-service-deepseek-jupyterai#patch-for-openrouter_api_key-keyword-not-accepted-error

Do you have a better workaround for my issue?

@srdas
Copy link
Collaborator

srdas commented Feb 5, 2025

@zonca I think this may have been fixed by https://github.com/jupyterlab/jupyter-ai/pull/1219/files, which has been merged, so it will be available in the next release, which should come shortly. I believe it will address the issue.

@dlqqq
Copy link
Member

dlqqq commented Feb 6, 2025

@zonca Thanks for documenting that issue with Jupyter AI in a blog post! I'm releasing the v2.29.1 patch release right now, which includes PR #1219 to patch this bug. Can you try out Jupyter AI v2.29.1 in a new Python environment, and see if you still encounter this issue?

@zonca
Copy link

zonca commented Feb 6, 2025

@srdas @dlqqq much appreciated thanks! I confirm this is fixed in 2.29.1, I'll also update my blog post.

@dlqqq
Copy link
Member

dlqqq commented Feb 6, 2025

@zonca Thanks for the quick reply! Really glad to hear. Our team will close this issue once we add documentation for vLLM. 🎉

@srdas
Copy link
Collaborator

srdas commented Feb 6, 2025

@zonca Thank you so much for verifying that the fix works. That was so quick! Appreciate your maintaining a blog to help others who are using Jupyter AI.

@cboettig
Copy link
Author

@srdas This is great, thanks much.

I am not seeing openrouter as an option for embedding model though?

@srdas
Copy link
Collaborator

srdas commented Feb 11, 2025

@cboettig Thanks for the question. I had looked into this when looking into vLLM. I may be wrong, but it seems as of now, OpenRouter does not support embedding models. There is some discussion here on it: https://community.n8n.io/t/how-to-use-embedding-models-with-openrouter/74678.

See also: n4ze3m/dialoqbase#217.

Image

Happy to look into this if you have a workaround for embeddings, or maybe add it to openrouter.py and open a PR.

@cboettig
Copy link
Author

@srdas is there a reason we can't just use the openai API and allow a Custom URL and Custom model name? This is how we access all our vllm-hosted models in all the other interfaces and it works perfectly.

For instance, I can use the juypter code-server with continue.dev plugin with my students to easily configure all our vllm models and toggle between configured models.

@dlqqq
Copy link
Member

dlqqq commented Feb 11, 2025

@cboettig That would also work! The issue is that we don't allow custom model IDs for OpenAI, since we only show the model IDs available from OpenAI. Also, ideally, a user would use "OpenRouter" (not "OpenAI") to access embedding models as well as chat models on vLLM; having 2 different providers for the same scenario would be confusing.

We can add an OpenRouter embedding provider that allows for custom fields. I'll open a new issue for tracking OpenRouter support. 👍

@cboettig
Copy link
Author

@dlqqq yup, 💯 that is this issue -- but why? I just don't quite understand why jupyter-ai decides not to allow custom model ids (and custom urls) for the openai interface. Every other library or package I work with supports this (langchain, openai's own packages, continue.dev, litellm, vllm, ellmer etc). Why it is necessary for Jupyter-AI to hardware the openai interface to use only OpenAI's model IDs? Even the packages built by openai itself let us override the model names and base url. Everywhere else, 'openai' just means the API protocol.

JupyterAI is a wonderful tool and I would love to use the more open, community led option here instead of the code-server extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants