Skip to content

Conversation

@devpramod
Copy link
Contributor

@devpramod devpramod commented Apr 14, 2025

Description

This PR implements support for using external OpenAI-compatible LLM endpoints with ChatQnA, CodeGen, and DocSum Helm charts while maintaining backward compatibility with existing configurations.

Motivation
Users need the flexibility to connect to external LLM services (like OpenAI or self-hosted OpenAI-compatible endpoints) instead of always relying on the included LLM components. This adds versatility to our Helm charts without disrupting existing functionality.

Key Changes
For each Helm chart (ChatQnA, CodeGen, DocSum):

  1. Added external-llm-values.yaml configuration file:
  • Provides settings for external LLM endpoints (LLM_SERVER_HOST_IP, LLM_MODEL, OPENAI_API_KEY)
  • Disables internal LLM services (tgi, vllm, llm-uservice) when using external endpoints
  1. Created templates/external-llm-configmap.yaml:
  • Manages environment variables required for external LLM integration
  • Conditionally created only when external LLM is enabled
  1. Updated templates/deployment.yaml:
  • Added conditional logic to use ConfigMap for environment variables when external LLM is enabled
  • Maintained all existing functionality and conditional paths for internal LLM services
  1. Updated Chart.yaml:
  • Added proper conditions for all dependencies to make them optional
  • Enables selective component deployment based on configuration values
  1. Updated README.md:
  • Added example commands showing how to use external LLM endpoints
  • Maintained consistent placeholder values across all services

Issues

Fixes #1015

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

'n/a'

Tests

Tested backward compatibility to ensure existing helm charts with default values.yaml is working

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 more things besides the embedded comment:

  1. Will there be a different image of opea/chatqna for the external LLM endpoint use case? Or the image will be the same?
  2. In order to make CI happy to test external-llm-values.yaml, we should have a valid LLM_SERVER_HOST_IP and OPENAI_API_KEY in CI environment. I believe you should have the similar CI requirements in GenAIExamples repo when make changes to chatqna mega gateway. We should follow the same convention here.

@devpramod
Copy link
Contributor Author

@lianhao
Should I set the defaults for codegen and docsum values.yaml as well?
i.e. .enabled?

@lianhao
Copy link
Collaborator

lianhao commented Apr 16, 2025

@lianhao Should I set the defaults for codegen and docsum values.yaml as well? i.e. .enabled?

yes. Also maybe we can split PRs for chatqna, codegen, docsum respectively? Just in case the mega gateway PR in GenAIExamples are not merged in the same time?

@lianhao
Copy link
Collaborator

lianhao commented Apr 17, 2025

@devpramod We need you to do us a favor. Could you please do a manual rebase on your branch before your next round updating of this PR? Otherwise, it will trigger lots of unnecessary test cases.

@lvliang-intel lvliang-intel marked this pull request as ready for review April 21, 2025 02:27
Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be noted that currently CI is under heavy burden doing release CD tests for 1.3

Copy link
Collaborator

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK once Lianhao's comments are resolved (and CI passes).

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devpramod please address the 2 remaining embedded comment. Also please rebase and fix the test error

Thx~

Copy link
Member

@poussa poussa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please address @lianhao comments and we are good to merge, I think.

@poussa
Copy link
Member

poussa commented May 13, 2025

LGTM. Please address @lianhao comments and we are good to merge, I think.

@devpramod ping.

@devpramod
Copy link
Contributor Author

Hi @poussa
working on resolving the comment, and also working on updating the CI for the new values file.
will update soon.

@devpramod devpramod force-pushed the helm-external-llm branch from 0a69a59 to 897fa9d Compare May 13, 2025 23:42
@eero-t
Copy link
Collaborator

eero-t commented May 16, 2025

Deployment templates are missing llm-uservice.enabled check(s), because CI test Helm installs fail to no template "llm-uservice.fullname" error, both for "CodeGen":

+ helm install --create-namespace --namespace infra-codegen-7875974e --wait --timeout 600s --set GOOGLE_API_KEY=*** --set GOOGLE_CSE_ID=*** --set web-retriever.GOOGLE_API_KEY=*** --set web-retriever.GOOGLE_CSE_ID=*** --set global.HUGGINGFACEHUB_API_TOKEN=*** --set global.modelUseHostPath=/data2/hf_model --values helm-charts//codegen/external-llm-values.yaml codegen14000636 helm-charts//codegen
Error: INSTALLATION FAILED: template: codegen/templates/deployment.yaml:38:24: executing "codegen/templates/deployment.yaml" at <include "llm-uservice.fullname" (index .Subcharts "llm-uservice")>: error calling include: template: no template "llm-uservice.fullname" associated with template "gotpl"
+ echo 'Failed to install chart /codegen'

And for "DocSum":

+ helm install --create-namespace --namespace infra-docsum-50d1e674 --wait --timeout 600s --set GOOGLE_API_KEY=*** --set GOOGLE_CSE_ID=*** --set web-retriever.GOOGLE_API_KEY=*** --set web-retriever.GOOGLE_CSE_ID=*** --set global.HUGGINGFACEHUB_API_TOKEN=*** --set global.modelUseHostPath=/data2/hf_model --values helm-charts//docsum/external-llm-values.yaml docsum14000012 helm-charts//docsum
Error: INSTALLATION FAILED: template: docsum/templates/deployment.yaml:38:24: executing "docsum/templates/deployment.yaml" at <include "llm-uservice.fullname" (index .Subcharts "llm-uservice")>: error calling include: template: no template "llm-uservice.fullname" associated with template "gotpl"
+ echo 'Failed to install chart /docsum'

I'm not sure why chatqna,k8s-gaudi,guardrails-gaudi-values fails, maybe because Helm install or vLLM timeout?

+ helm install --create-namespace --namespace infra-chatqna-cbdb40b4 --wait --timeout 900s --set GOOGLE_API_KEY=*** --set GOOGLE_CSE_ID=*** --set web-retriever.GOOGLE_API_KEY=*** --set web-retriever.GOOGLE_CSE_ID=*** --set global.HUGGINGFACEHUB_API_TOKEN=*** --set global.modelUseHostPath=/data2/hf_model --values helm-charts//chatqna/guardrails-gaudi-values.yaml chatqna13235020 helm-charts//chatqna
Error: INSTALLATION FAILED: context deadline exceeded
+ echo 'Failed to install chart /chatqna'
...
[pod/chatqna13235020-vllm-f4587ff8b-6fdgl/vllm] INFO 05-14 00:05:31 hpu_model_runner.py:1730] [Warmup][Graph/Decode][355/416] batch_size:2 num_blocks:384 free_mem:14.45 GiB
-----------------------------------
+ exit 1

chatqna,external-llm-values CI test fails due to something looking like mismatch between how CI tests ChatQnA:

 [pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]   File "/home/user/comps/cores/telemetry/opea_telemetry.py", line 61, in wrapper
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]     res = await func(*args, **kwargs)
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]   File "/home/user/comps/cores/mega/orchestrator.py", line 258, in execute
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]     endpoint = self.services[cur_node].endpoint_path(inputs["model"])
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]   File "/home/user/comps/cores/mega/micro_service.py", line 142, in endpoint_path
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]     model_endpoint = model.split("/")[1]
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243]                      ~~~~~~~~~~~~~~~~^^^
[pod/chatqna13235243-5dd6bff8c6-rsgmk/chatqna13235243] IndexError: list index out of range

and this PR opea-project/GenAIComps#1583 merged few weeks ago, that changed above code.

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow the chatqna method to set external LLM related config to avoid the above errors @eero-t mentioned

pre-commit-ci bot and others added 8 commits June 30, 2025 18:48
…rnal LLM and data-prep configurations; update values.yaml for productivity suite and docsum to enable necessary services and endpoints.

Signed-off-by: devpramod <[email protected]>
…onditionals in deployment and configmap templates for chatqna, codegen, and docsum; update values.yaml to enable whisper and redis-vector-db services.

Signed-off-by: devpramod <[email protected]>
…o override them if necessary

Signed-off-by: devpramod <[email protected]>
@devpramod devpramod force-pushed the helm-external-llm branch from c713304 to 167d0f5 Compare June 30, 2025 22:59
@lianhao
Copy link
Collaborator

lianhao commented Jul 1, 2025

@devpramod please do the following steps to make CI happy:

  1. Ask Suyue to add those secrets used in the CI into GenAIInfra repo. (She's the one who have the access to do that)
  2. Submit a separate PR for the changes in the _helm_e2e.html alone.
  3. Once step 2 PR is merged, rebase this PR then we should be all good.

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this PR itself. But we need to make some changes to make CI happy, following the steps as I listed here #993 (comment)

@poussa poussa self-requested a review July 1, 2025 05:40
@devpramod devpramod force-pushed the helm-external-llm branch from 167d0f5 to 8b1b207 Compare July 1, 2025 22:44
Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@louie-tsai @devpramod I would suggest some minor changes, and also please help resolve the merge conflict issue. Others LGTM

#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/rocm-tgi-values.yaml

# To use with external OpenAI compatible LLM endpoint
#helm install chatqna chatqna -f chatqna/external-llm-values.yaml --set externalLLM.LLM_SERVER_HOST_IP="http://your-llm-server" --set externalLLM.LLM_MODEL="your-model" --set externalLLM.OPENAI_API_KEY="your-api-key"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the file name external-llm-values.yaml accordingly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poussa already did a merge, so I added separate #1153 PR to fix the value file names.

# helm install codegen codegen --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-uservcie.LLM_MODEL_ID=${MODELNAME} --set tgi.LLM_MODEL_ID=${MODELNAME} -f codegen/rocm-tgi-values.yaml

# To use with external OpenAI compatible LLM endpoint
# helm install codegen codegen -f codegen/external-llm-values.yaml --set externalLLM.LLM_SERVER_HOST_IP="http://your-llm-server" --set externalLLM.LLM_MODEL="your-model" --set externalLLM.OPENAI_API_KEY="your-api-key"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the file name external-llm-values.yaml accordingly

@lianhao lianhao changed the title Add support for external OpenAI-compatible LLM endpoints across Helm charts (chatqna, codegen, docsum) Add support for external OpenAI-compatible LLM endpoints across Helm charts (chatqna, codegen) Jul 4, 2025
@lianhao lianhao added this to the v1.4 milestone Jul 4, 2025
@poussa poussa merged commit c977012 into opea-project:main Jul 4, 2025
32 checks passed
Comment on lines +73 to +78
condition: data-prep.enabled
- name: ui
alias: chatqna-ui
version: 0-latest
repository: "file://../common/ui"
condition: chatqna-ui.enabled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling either data-prep or chatqna-ui means that Helm fails parsing the ChatQnA NGINX template: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/chatqna/templates/nginx.yaml

(Because their names have dashes, Helm does not understand if one adds ifs to the template for them being enabled.)

Comment on lines +40 to +45
condition: data-prep.enabled
- name: ui
version: 0-latest
repository: "file://../common/ui"
alias: codegen-ui
condition: codegen-ui.enabled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling either data-prep or codegen-ui means that Helm fails parsing the CodeGen NGINX template:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/codegen/templates/ui-nginx.yaml

Comment on lines +12 to +20
# Disable internal LLM services when using external LLM
llm-uservice:
enabled: false

vllm:
enabled: false

tgi:
enabled: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing:

ollama:
  enabled: false

@eero-t
Copy link
Collaborator

eero-t commented Jul 18, 2025

Added PR #1166 for fixing above and some additional issues I noticed later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Enable remote inference endpoints in helm chats for ChatQnA, CodeGen and DocSum [ci-auto] ChatQnA/CodeGen support remote LLM endpoint

8 participants