Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config #1264

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

srdas
Copy link
Collaborator

@srdas srdas commented Feb 27, 2025

Description

The OpenAI model interface has been widely adopted by many model providers (DeepSeek, vLLM, etc.) and this PR enables accessing these models using the OpenAI provider. Current OpenAI models are also accessible via the same interface.

This PR also updates related documentation on the use of these models that work via the OpenAI provider.

These updates work for selecting chat and embeddings models. Chat models are tested to work with models from OpenAI, DeepSeek, and models hosted by vLLM. Embedding models are tested for OpenAI models. DeepSeek does not have an API for embedding models, and OpenRouter also does not support as yet any embedding models.

Also, this PR corrects the coupled fields problem in the AI Settings page.

Finally, added the embedding fields to the config_scheme.json and made related changes to config_manager.py and test_ config_manager.py

Each of these changes is now described below in some more detail.

Demo of new features

See the new usage of models and the required settings shown below, note the new "OpenAI::general interface":
image

For any OpenAI model:
openai-chat-openai

For DeepSeek models:
openai-chat-deepseek

For models deployed with vLLM:
openai-chat-vllm

Embedding Models

First, tested to make sure that the OpenAI models are working as intended with no changes to the code:
image

Second, modified check that the interface takes any OpenAI embedding model as an input and test that it works with OpenAI models as before:

image

Fixed coupled model field inputs

We can see that the fields are not coupled any more as shown below:
image

Added embeddings_fields to config_schema.json

  • Updated config_manager.py to handle the new fields.
  • Also updated analogous code for continuation models.
  • Updated test_config_manager.py for the additional embedding field in config.

@srdas srdas added the enhancement New feature or request label Feb 27, 2025
@srdas srdas changed the title Create a custom OpenAI provider to use multiple model providers Create a custom OpenAI provider to use multiple models Feb 27, 2025
@srdas srdas marked this pull request as ready for review February 27, 2025 19:57
@dlqqq dlqqq linked an issue Feb 27, 2025 that may be closed by this pull request
@srdas srdas force-pushed the openai_generic_2 branch from 1ba53fd to f59026b Compare March 5, 2025 06:20
@srdas srdas requested a review from dlqqq March 5, 2025 15:38
@srdas srdas changed the title Create a custom OpenAI provider to use multiple models Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config Mar 10, 2025
srdas and others added 8 commits March 10, 2025 15:21
* make native chat handlers customizable

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove-ci-error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add-disabled-check-and-sort-entrypoints

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor Chat Handlers to Simplify Initialization (jupyterlab#1257)

* simplify-entrypoints-loading

* fix-lint

* fix-tests

* add-retriever-typing

* remove-retriever-from-base

* fix-circular-import(ydoc-import)

* fix-tests

* fix-type-check-failure

* refactor-retriever-init

* Allow chat handlers to be initialized in any order (jupyterlab#1268)

* lazy-initialize-retriever

* add-retriever-property

* rebase-into-main

* update-docs

* update-documentation

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Copy link
Member

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srdas Thank you for working on this! This is definitely one of the most challenging tasks that you've taken on thus far. Left some feedback for you below.

I think it would be best to hold off on merging this PR until after the v2.30.0 release scheduled for tomorrow. The ConfigManager areas of the code are fragile, and making lots of changes there introduces risk to users. I recommend that we ship this later to give us more time to thoroughly test these changes and mitigate that risk.

Comment on lines 261 to 275
if config_key == "fields":
# Ensure fields dictionaries are initialized
default_config["fields"] = default_value
default_config["embeddings_fields"] = default_config.get(
"embeddings_fields", {}
)
default_config["completions_fields"] = default_config.get(
"completions_fields", {}
)
elif config_key == "embeddings_fields":
default_config["embeddings_fields"] = default_value
elif config_key == "completions_fields":
default_config["completions_fields"] = default_value
else:
default_config[config_key] = default_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes don't appear necessary. This loop iterates through config_keys := GlobalConfig.model_fields.keys(), which lists out all of the keys in the GlobalConfig dictionary. So in this PR, config_keys should evaluate to:

[
    'model_provider_id',
    'embeddings_provider_id',
    'send_with_shift_enter',
    'fields',
    'api_keys',
    'completions_model_provider_id',
    'completions_fields',
    'embeddings_fields',
]

Setting default_config[config_key] = default_value will set each key in the dictionary to the default value defined in the JSON Schema. Therefore, I think these changes can be reverted.

},
completions: {
lmProvider: cLmProvider,
lmLocalId: cLmLocalId
lmLocalId: cLmLocalId,
emLocalId: emLocalId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The completions key only needs to store the model ID of the completions model, so it does not need a emLocalId key within it. This line can be dropped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see that this was done to prevent a TypeScript build error. You can still delete this line. Then, change line 29 from:

completions: Omit<ProvidersInfo, 'emProvider'>;

to:

completions: Omit<ProvidersInfo, 'emProvider' | 'emLocalId'>;

This tells TypeScript that config.completions lacks the emProvider and emLocalId keys, which should fix the build error you faced.

Comment on lines 251 to 260
fields: {
...(lmGlobalId && {
[lmGlobalId]: fields
[lmGlobalId]: lmFields
}),
...(clmGlobalId && {
[clmGlobalId]: fields
[clmGlobalId]: clmFields
}),
...(emGlobalId && {
[emGlobalId]: embeddingModelFields
[emGlobalId]: emFields
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This area of the code is likely what was causing the fields to be saved incorrectly, as you showed me this morning in our 1:1.

This is merging all the fields under the fields key in the dictionary. It's not saving completion fields under completions_fields or saving embedding fields under embeddings_fields.

Admittedly, the existing code is convoluted because it relies heavily on object destructuring. We did this to avoid sending an empty object under fields. However, sending an UpdateConfigRequest with fields: {} should leave the fields unchanged, so this syntax isn't really necessary. I would suggest this change to fix the bug and make the code more readable:

...
api_keys: apiKeys,
fields: lmGlobalId ? { [lmGlobalId]: lmFields } : {},
completions_fields: clmGlobalId ? { [clmGlobalId]: clmFields } : {},
embeddings_fields: emGlobalId ? { [emGlobalId]: emFields } : {},
...

You will have to add embeddings_fields to the UpdateConfigRequest object in both the frontend & backend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move forward with this change, we should make sure that we test this in test_config_manager.py, specifically in test_update_no_empty_field_dicts().

@eugenecherepanov
Copy link

do you have any plan for merge this feature?

@srdas
Copy link
Collaborator Author

srdas commented Mar 17, 2025

do you have any plan for merge this feature?

@eugenecherepanov I have a couple of things still not working that need to be cleared before this can be fully tested and reviewed, hoping to get it done very soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants