Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config #1264

srdas · 2025-02-27T07:17:11Z

Description

Fixes Add support for embedding models served through an OpenAI API #1240.
Fixes Bug: Model field inputs are coupled in Settings UI #1261
Related to PR Simplifying the OpenAI provider to use multiple model providers #1248 (closed to open this extended one)

The OpenAI model interface has been widely adopted by many model providers (DeepSeek, vLLM, etc.) and this PR enables accessing these models using the OpenAI provider. Current OpenAI models are also accessible via the same interface.

This PR also updates related documentation on the use of these models that work via the OpenAI provider.

These updates work for selecting chat and embeddings models. Chat models are tested to work with models from OpenAI, DeepSeek, and models hosted by vLLM. Embedding models are tested for OpenAI models. DeepSeek does not have an API for embedding models, and OpenRouter also does not support as yet any embedding models.

Also, this PR corrects the coupled fields problem in the AI Settings page.

Finally, added the embedding fields to the config_scheme.json and made related changes to config_manager.py and test_ config_manager.py

Each of these changes is now described below in some more detail.

Demo of new features

See the new usage of models and the required settings shown below, note the new "OpenAI::general interface":

For any OpenAI model:

For DeepSeek models:

For models deployed with vLLM:

Embedding Models

First, tested to make sure that the OpenAI models are working as intended with no changes to the code:

Second, modified check that the interface takes any OpenAI embedding model as an input and test that it works with OpenAI models as before:

Fixed coupled model field inputs

We can see that the fields are not coupled any more as shown below:

Added `embeddings_fields` to `config_schema.json`

Updated config_manager.py to handle the new fields.
Also updated analogous code for continuation models.
Updated test_config_manager.py for the additional embedding field in config.

for more information, see https://pre-commit.ci

* make native chat handlers customizable * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove-ci-error * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add-disabled-check-and-sort-entrypoints * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Chat Handlers to Simplify Initialization (jupyterlab#1257) * simplify-entrypoints-loading * fix-lint * fix-tests * add-retriever-typing * remove-retriever-from-base * fix-circular-import(ydoc-import) * fix-tests * fix-type-check-failure * refactor-retriever-init * Allow chat handlers to be initialized in any order (jupyterlab#1268) * lazy-initialize-retriever * add-retriever-property * rebase-into-main * update-docs * update-documentation --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

dlqqq

@srdas Thank you for working on this! This is definitely one of the most challenging tasks that you've taken on thus far. Left some feedback for you below.

I think it would be best to hold off on merging this PR until after the v2.30.0 release scheduled for tomorrow. The ConfigManager areas of the code are fragile, and making lots of changes there introduces risk to users. I recommend that we ship this later to give us more time to thoroughly test these changes and mitigate that risk.

dlqqq · 2025-03-12T21:26:59Z

packages/jupyter-ai/jupyter_ai/config_manager.py

+                if config_key == "fields":
+                    # Ensure fields dictionaries are initialized
+                    default_config["fields"] = default_value
+                    default_config["embeddings_fields"] = default_config.get(
+                        "embeddings_fields", {}
+                    )
+                    default_config["completions_fields"] = default_config.get(
+                        "completions_fields", {}
+                    )
+                elif config_key == "embeddings_fields":
+                    default_config["embeddings_fields"] = default_value
+                elif config_key == "completions_fields":
+                    default_config["completions_fields"] = default_value
+                else:
+                    default_config[config_key] = default_value


These changes don't appear necessary. This loop iterates through config_keys := GlobalConfig.model_fields.keys(), which lists out all of the keys in the GlobalConfig dictionary. So in this PR, config_keys should evaluate to:

[ 'model_provider_id', 'embeddings_provider_id', 'send_with_shift_enter', 'fields', 'api_keys', 'completions_model_provider_id', 'completions_fields', 'embeddings_fields', ]

Setting default_config[config_key] = default_value will set each key in the dictionary to the default value defined in the JSON Schema. Therefore, I think these changes can be reverted.

dlqqq · 2025-03-12T21:32:05Z

packages/jupyter-ai/src/components/settings/use-server-info.ts

        },
        completions: {
          lmProvider: cLmProvider,
-          lmLocalId: cLmLocalId
+          lmLocalId: cLmLocalId,
+          emLocalId: emLocalId


The completions key only needs to store the model ID of the completions model, so it does not need a emLocalId key within it. This line can be dropped.

Oh, I see that this was done to prevent a TypeScript build error. You can still delete this line. Then, change line 29 from:

completions: Omit<ProvidersInfo, 'emProvider'>;

to:

completions: Omit<ProvidersInfo, 'emProvider' | 'emLocalId'>;

This tells TypeScript that config.completions lacks the emProvider and emLocalId keys, which should fix the build error you faced.

dlqqq · 2025-03-12T21:53:44Z

packages/jupyter-ai/src/components/chat-settings.tsx

        fields: {
          ...(lmGlobalId && {
-            [lmGlobalId]: fields
+            [lmGlobalId]: lmFields
          }),
          ...(clmGlobalId && {
-            [clmGlobalId]: fields
+            [clmGlobalId]: clmFields
          }),
          ...(emGlobalId && {
-            [emGlobalId]: embeddingModelFields
+            [emGlobalId]: emFields
          })


This area of the code is likely what was causing the fields to be saved incorrectly, as you showed me this morning in our 1:1.

This is merging all the fields under the fields key in the dictionary. It's not saving completion fields under completions_fields or saving embedding fields under embeddings_fields.

Admittedly, the existing code is convoluted because it relies heavily on object destructuring. We did this to avoid sending an empty object under fields. However, sending an UpdateConfigRequest with fields: {} should leave the fields unchanged, so this syntax isn't really necessary. I would suggest this change to fix the bug and make the code more readable:

... api_keys: apiKeys, fields: lmGlobalId ? { [lmGlobalId]: lmFields } : {}, completions_fields: clmGlobalId ? { [clmGlobalId]: clmFields } : {}, embeddings_fields: emGlobalId ? { [emGlobalId]: emFields } : {}, ...

You will have to add embeddings_fields to the UpdateConfigRequest object in both the frontend & backend.

If we move forward with this change, we should make sure that we test this in test_config_manager.py, specifically in test_update_no_empty_field_dicts().

for more information, see https://pre-commit.ci

eugenecherepanov · 2025-03-17T13:31:59Z

do you have any plan for merge this feature?

srdas · 2025-03-17T14:59:56Z

do you have any plan for merge this feature?

@eugenecherepanov I have a couple of things still not working that need to be cleared before this can be fully tested and reviewed, hoping to get it done very soon.

srdas mentioned this pull request Feb 27, 2025

Simplifying the OpenAI provider to use multiple model providers #1248

Closed

srdas added the enhancement New feature or request label Feb 27, 2025

srdas changed the title ~~Create a custom OpenAI provider to use multiple model providers~~ Create a custom OpenAI provider to use multiple models Feb 27, 2025

srdas marked this pull request as ready for review February 27, 2025 19:57

dlqqq linked an issue Feb 27, 2025 that may be closed by this pull request

Add support for embedding models served through an OpenAI API #1240

Open

This was referenced Mar 1, 2025

Bug: Model field inputs are coupled in Settings UI #1261

Open

Refactor Chat Handlers to Simplify Initialization #1257

Merged

srdas and others added 10 commits March 4, 2025 22:16

Simplifying the OpenAI provider to use multiple model providers

f02f654

Update openrouter.md

2cfd037

[pre-commit.ci] auto fixes from pre-commit.com hooks

1beac32

for more information, see https://pre-commit.ci

openai general interface added

3999a7e

[pre-commit.ci] auto fixes from pre-commit.com hooks

866adbd

for more information, see https://pre-commit.ci

embedding

291a76d

[pre-commit.ci] auto fixes from pre-commit.com hooks

214a664

for more information, see https://pre-commit.ci

Updated settings to take OpenAI generic embedding models

175cdfb

added openai generic embeddings screenshot

b33f4e2

Fixed Issue 1261

f59026b

srdas force-pushed the openai_generic_2 branch from 1ba53fd to f59026b Compare March 5, 2025 06:20

srdas requested a review from dlqqq March 5, 2025 15:38

srdas and others added 4 commits March 7, 2025 09:33

bump version floor on jupyter server

245bdb1

linter

002f6bb

adding embedding model fields

e66c0c1

[pre-commit.ci] auto fixes from pre-commit.com hooks

f5ca39d

for more information, see https://pre-commit.ci

srdas changed the title ~~Create a custom OpenAI provider to use multiple models~~ Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config Mar 10, 2025

srdas added 6 commits March 10, 2025 08:57

Update test_config_manager

12b0e18

Update pyproject.toml

448bd41

Update pyproject.toml

700e0a6

Update pyproject.toml

a3b7a18

pyproject.toml fixes

434d6fe

pyproject.toml updates

1d77b09

srdas and others added 8 commits March 10, 2025 15:21

Update pyproject.toml

923113d

Update pyproject.toml

99f7d56

Merge branch 'main' into openai_generic_2

2dcc556

pyproject toml files

f3d4db1

pyproject toml updates

c54a364

Merge branch 'main' into openai_generic_2

81ced73

update snapshot

311e9a8

dlqqq requested changes Mar 12, 2025

View reviewed changes

srdas and others added 5 commits March 14, 2025 01:24

writing config file correctly

439f8ea

[pre-commit.ci] auto fixes from pre-commit.com hooks

04e6477

for more information, see https://pre-commit.ci

tsx lint

b1ba3a3

Update use-server-info.ts

b03a173

Update pyproject.toml

f9103e9

adds embedding_models attribute

f6c7adb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config #1264

Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config #1264

srdas commented Feb 27, 2025 •

edited

Loading

dlqqq left a comment

dlqqq Mar 12, 2025

dlqqq Mar 12, 2025

dlqqq Mar 12, 2025

dlqqq Mar 12, 2025

dlqqq Mar 12, 2025

eugenecherepanov commented Mar 17, 2025

srdas commented Mar 17, 2025

Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config #1264

Are you sure you want to change the base?

Create a custom OpenAI provider to use multiple models, resolve coupled input fields, add Embedding Fields to config #1264

Conversation

srdas commented Feb 27, 2025 • edited Loading

Description

Demo of new features

Embedding Models

Fixed coupled model field inputs

Added embeddings_fields to config_schema.json

dlqqq left a comment

Choose a reason for hiding this comment

dlqqq Mar 12, 2025

Choose a reason for hiding this comment

dlqqq Mar 12, 2025

Choose a reason for hiding this comment

dlqqq Mar 12, 2025

Choose a reason for hiding this comment

dlqqq Mar 12, 2025

Choose a reason for hiding this comment

dlqqq Mar 12, 2025

Choose a reason for hiding this comment

eugenecherepanov commented Mar 17, 2025

srdas commented Mar 17, 2025

srdas commented Feb 27, 2025 •

edited

Loading

Added `embeddings_fields` to `config_schema.json`