-
-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add VLLM support? #1010
Comments
It looks like the team at the National Research Platform has a nice work-around for this at the moment using LiteLLM via it's OpenAI-compatible API (https://docs.litellm.ai/docs/proxy/user_keys) This works, though isn't really the same as direct VLLM support, but thought it was worth mentioning. |
Which environnemental variables do I have to set-up in order to use my own litellm instance, such as https://litellm.mylaboratory.gov/ |
@flefevre LiteLLM seems to support the following routes:
I could use Language model (Completion model) and Embedding model by specifying the LiteLLM server URL as follows: Language model
Embedding model
The actual model name can be shown by |
@cboettig Here is how you can use Jupyter AI to run models from a vLLM server using Install vLLM on the server. (I installed it locally on my Mac, which requires installation from source for Apple silicon in its own python environment.) For instructions, see: https://docs.vllm.ai/en/latest/getting_started/installation/index.html Serve up the vLLM model you want from the environment: ![]() You will see the server running on port 8000: ![]() To check which models are running enter this in your browser and verify the model ids: ![]() Start up Jupyter AI and update the AI Settings as follows (notice that we are using OpenRouter as the provider, which is a unified interface for LLMs based on OpenAI's API interface): ![]() You can test that the model is working by running the following in a JupyterLab notebook: ![]() Then, try the chat in Jupyter AI to see that it works: ![]() I will update the documentation for the usage of |
thanks @srdas, this is extremely useful, Do you have a better workaround for my issue? |
@zonca I think this may have been fixed by https://github.com/jupyterlab/jupyter-ai/pull/1219/files, which has been merged, so it will be available in the next release, which should come shortly. I believe it will address the issue. |
@zonca Thanks for the quick reply! Really glad to hear. Our team will close this issue once we add documentation for vLLM. 🎉 |
@zonca Thank you so much for verifying that the fix works. That was so quick! Appreciate your maintaining a blog to help others who are using Jupyter AI. |
@srdas This is great, thanks much. I am not seeing openrouter as an option for embedding model though? |
@cboettig Thanks for the question. I had looked into this when looking into vLLM. I may be wrong, but it seems as of now, OpenRouter does not support embedding models. There is some discussion here on it: https://community.n8n.io/t/how-to-use-embedding-models-with-openrouter/74678. See also: n4ze3m/dialoqbase#217. Happy to look into this if you have a workaround for embeddings, or maybe add it to openrouter.py and open a PR. |
@srdas is there a reason we can't just use the openai API and allow a Custom URL and Custom model name? This is how we access all our vllm-hosted models in all the other interfaces and it works perfectly. For instance, I can use the juypter code-server with continue.dev plugin with my students to easily configure all our vllm models and toggle between configured models. |
@cboettig That would also work! The issue is that we don't allow custom model IDs for OpenAI, since we only show the model IDs available from OpenAI. Also, ideally, a user would use "OpenRouter" (not "OpenAI") to access embedding models as well as chat models on vLLM; having 2 different providers for the same scenario would be confusing. We can add an OpenRouter embedding provider that allows for custom fields. I'll open a new issue for tracking OpenRouter support. 👍 |
@dlqqq yup, 💯 that is this issue -- but why? I just don't quite understand why jupyter-ai decides not to allow custom model ids (and custom urls) for the openai interface. Every other library or package I work with supports this (langchain, openai's own packages, continue.dev, litellm, vllm, ellmer etc). Why it is necessary for Jupyter-AI to hardware the openai interface to use only OpenAI's model IDs? Even the packages built by openai itself let us override the model names and base url. Everywhere else, 'openai' just means the API protocol. JupyterAI is a wonderful tool and I would love to use the more open, community led option here instead of the code-server extensions. |
Problem/Solution
It has been great to see Ollama added as a first-class option in #646, this has made it easy to access a huge variety of models and has been working very well us.
I increasingly see groups and university providers using VLLM for this as well. I'm out of my depth but I understand VLLM is considered better suited when a group is serving a local model to multiple users (e.g. from a local GPU cluster, rather than everyone running an independent Ollama). It gets passing mention in some threads here as well. I think supporting more providers is all to the good and would love to see support for this as a backend similar to the existing Ollama support, though maybe I'm not understanding the details and that is unnecessary? (i.e. it looks like it might be possible to simply use the OpenAI configuration with alternative endpoint to access a VLLM server?)
The text was updated successfully, but these errors were encountered: