-
-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config #1264
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
* make native chat handlers customizable * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove-ci-error * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add-disabled-check-and-sort-entrypoints * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Chat Handlers to Simplify Initialization (jupyterlab#1257) * simplify-entrypoints-loading * fix-lint * fix-tests * add-retriever-typing * remove-retriever-from-base * fix-circular-import(ydoc-import) * fix-tests * fix-type-check-failure * refactor-retriever-init * Allow chat handlers to be initialized in any order (jupyterlab#1268) * lazy-initialize-retriever * add-retriever-property * rebase-into-main * update-docs * update-documentation --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srdas Thank you for working on this! This is definitely one of the most challenging tasks that you've taken on thus far. Left some feedback for you below.
I think it would be best to hold off on merging this PR until after the v2.30.0 release scheduled for tomorrow. The ConfigManager areas of the code are fragile, and making lots of changes there introduces risk to users. I recommend that we ship this later to give us more time to thoroughly test these changes and mitigate that risk.
if config_key == "fields": | ||
# Ensure fields dictionaries are initialized | ||
default_config["fields"] = default_value | ||
default_config["embeddings_fields"] = default_config.get( | ||
"embeddings_fields", {} | ||
) | ||
default_config["completions_fields"] = default_config.get( | ||
"completions_fields", {} | ||
) | ||
elif config_key == "embeddings_fields": | ||
default_config["embeddings_fields"] = default_value | ||
elif config_key == "completions_fields": | ||
default_config["completions_fields"] = default_value | ||
else: | ||
default_config[config_key] = default_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes don't appear necessary. This loop iterates through config_keys := GlobalConfig.model_fields.keys()
, which lists out all of the keys in the GlobalConfig
dictionary. So in this PR, config_keys
should evaluate to:
[
'model_provider_id',
'embeddings_provider_id',
'send_with_shift_enter',
'fields',
'api_keys',
'completions_model_provider_id',
'completions_fields',
'embeddings_fields',
]
Setting default_config[config_key] = default_value
will set each key in the dictionary to the default value defined in the JSON Schema. Therefore, I think these changes can be reverted.
}, | ||
completions: { | ||
lmProvider: cLmProvider, | ||
lmLocalId: cLmLocalId | ||
lmLocalId: cLmLocalId, | ||
emLocalId: emLocalId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The completions
key only needs to store the model ID of the completions model, so it does not need a emLocalId
key within it. This line can be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see that this was done to prevent a TypeScript build error. You can still delete this line. Then, change line 29 from:
completions: Omit<ProvidersInfo, 'emProvider'>;
to:
completions: Omit<ProvidersInfo, 'emProvider' | 'emLocalId'>;
This tells TypeScript that config.completions
lacks the emProvider
and emLocalId
keys, which should fix the build error you faced.
fields: { | ||
...(lmGlobalId && { | ||
[lmGlobalId]: fields | ||
[lmGlobalId]: lmFields | ||
}), | ||
...(clmGlobalId && { | ||
[clmGlobalId]: fields | ||
[clmGlobalId]: clmFields | ||
}), | ||
...(emGlobalId && { | ||
[emGlobalId]: embeddingModelFields | ||
[emGlobalId]: emFields | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This area of the code is likely what was causing the fields to be saved incorrectly, as you showed me this morning in our 1:1.
This is merging all the fields under the fields
key in the dictionary. It's not saving completion fields under completions_fields
or saving embedding fields under embeddings_fields
.
Admittedly, the existing code is convoluted because it relies heavily on object destructuring. We did this to avoid sending an empty object under fields
. However, sending an UpdateConfigRequest
with fields: {}
should leave the fields unchanged, so this syntax isn't really necessary. I would suggest this change to fix the bug and make the code more readable:
...
api_keys: apiKeys,
fields: lmGlobalId ? { [lmGlobalId]: lmFields } : {},
completions_fields: clmGlobalId ? { [clmGlobalId]: clmFields } : {},
embeddings_fields: emGlobalId ? { [emGlobalId]: emFields } : {},
...
You will have to add embeddings_fields
to the UpdateConfigRequest
object in both the frontend & backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move forward with this change, we should make sure that we test this in test_config_manager.py
, specifically in test_update_no_empty_field_dicts()
.
do you have any plan for merge this feature? |
@eugenecherepanov I have a couple of things still not working that need to be cleared before this can be fully tested and reviewed, hoping to get it done very soon. |
Description
The OpenAI model interface has been widely adopted by many model providers (DeepSeek, vLLM, etc.) and this PR enables accessing these models using the OpenAI provider. Current OpenAI models are also accessible via the same interface.
This PR also updates related documentation on the use of these models that work via the OpenAI provider.
These updates work for selecting chat and embeddings models. Chat models are tested to work with models from OpenAI, DeepSeek, and models hosted by vLLM. Embedding models are tested for OpenAI models. DeepSeek does not have an API for embedding models, and OpenRouter also does not support as yet any embedding models.
Also, this PR corrects the coupled fields problem in the AI Settings page.
Finally, added the embedding fields to the
config_scheme.json
and made related changes toconfig_manager.py
andtest_ config_manager.py
Each of these changes is now described below in some more detail.
Demo of new features
See the new usage of models and the required settings shown below, note the new "OpenAI::general interface":

For any OpenAI model:

For DeepSeek models:

For models deployed with vLLM:

Embedding Models
First, tested to make sure that the OpenAI models are working as intended with no changes to the code:

Second, modified check that the interface takes any OpenAI embedding model as an input and test that it works with OpenAI models as before:
Fixed coupled model field inputs
We can see that the fields are not coupled any more as shown below:

Added
embeddings_fields
toconfig_schema.json
config_manager.py
to handle the new fields.test_config_manager.py
for the additional embedding field in config.