Enable docsum to connect with external llm endpoints via helm #1141

devpramod · 2025-07-01T23:34:44Z

Description

This PR implements support for using external OpenAI-compatible LLM endpoints with DocSum Helm chart while maintaining backward compatibility with existing configurations.

Motivation
Users need the flexibility to connect to external LLM services (like OpenAI or self-hosted OpenAI-compatible endpoints) instead of always relying on the included LLM components. This adds versatility to our Helm charts without disrupting existing functionality.

Key Changes

Added external-llm-values.yaml configuration file:

Provides settings for external LLM endpoints (LLM_SERVER_HOST_IP, LLM_MODEL, OPENAI_API_KEY)
Disables internal LLM services (tgi, vllm, llm-uservice) when using external endpoints

Created templates/external-llm-configmap.yaml:

Manages environment variables required for external LLM integration
Conditionally created only when external LLM is enabled

Updated templates/deployment.yaml:

Added conditional logic to use ConfigMap for environment variables when external LLM is enabled
Maintained all existing functionality and conditional paths for internal LLM services

Updated Chart.yaml:

Added proper conditions for all dependencies to make them optional
Enables selective component deployment based on configuration values

Updated README.md:

Added example commands showing how to use external LLM endpoints
Maintained consistent placeholder values across all services

Issues

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)

Dependencies

'n/a'

Tests

Tested backward compatibility to ensure existing helm charts with default values.yaml is working

eero-t · 2025-07-02T08:52:59Z

I guess external test needs the e2e PR to be merged first.

ROCM tests fail to helm: command not found in helm install test phase => @chensuyue ?

DocSum Gaudi values failure is a know bug, for which @lianhao used #1132 as a workaround (with another chart): opea-project/GenAIComps#1719

helm-charts/docsum/README.md

lianhao · 2025-07-04T05:28:48Z

Please rebase after #1146 is merged, for the default model changes in DocSum

eero-t · 2025-07-04T11:21:29Z

Please rebase after #1146 is merged, for the default model changes in DocSum

@devpramod Relevant PRs have been merged so this can now be rebased (there are some conflicts with the README changes from the other PRs merged in the meanwhile, which need to handled).

lianhao

While we're waiting for the CI to be resumed, please address these issue first. Thx~

helm-charts/docsum/README.md

helm-charts/docsum/external-llm-values.yaml

srinarayan-srikanthan · 2025-07-14T17:24:19Z

While we're waiting for the CI to be resumed, please address these issue first. Thx~

@lianhao can we merge this, comments addressed.

lianhao

code LGTM. Let's wait the CI back online before merging this.

eero-t

(Not setting this review as change request as I will be OoO and cannot approve fixes to them before code freeze.)

helm-charts/docsum/values.yaml

helm-charts/docsum/variant_external-llm-values.yaml

helm-charts/docsum/README.md

eero-t

I would suggest using LLM_SERVICE_HOST as the Helm variable name even if it would mapped to LLM_SERVICE_HOST_IP environment variable (in deployment template), because it's both shorter and more correct (user is unlikely to be using IP numbers with Kubernetes).

=> I'll add PR for doing that change for already merged ChatQnA & CodeGen Helm charts (+ add KubeAI example).

eero-t · 2025-07-18T18:16:06Z

I would suggest using LLM_SERVICE_HOST as the Helm variable name even if it would mapped to LLM_SERVICE_HOST_IP environment variable ...

=> I'll add PR for doing that change for already merged ChatQnA & CodeGen Helm charts (+ add KubeAI example).

Half of GenAIExamples apps use LLM_SERVER_ and half use LLM_SERVICE_ prefix for env variable names (see opea-project/GenAIExamples#2143).

While most of apps in _GenAIInfra have Helm charts belong to latter category, I decided to use the shorter prefix for Helm variable names (LLM_SERVER_HOST / LLM_SERVER_PORT).

=> Please see PR #1166

eero-t · 2025-07-18T19:41:45Z

Disables internal LLM services (tgi, vllm, llm-uservice) when using external endpoints

But does DocSum even work without the llm-uservice wrapper service for the real LLM?

(I don't think that OpenAI inferencing service would be running OPEA specific DocSum wrapper for LLM...)

PS. I have also same question for FaqGen variant of ChatQnA, as that also seems to require the wrapper service, which was similarly disabled in #993.

devpramod · 2025-07-31T16:43:19Z

Hi @eero-t I have made all the updates requested.

eero-t · 2025-08-11T14:32:24Z

(back from vacation)

Hi @eero-t I have made all the updates requested.

@devpramod PR looks otherwise fine, but I'm still wondering about this:

Does DocSum work without the llm-uservice wrapper service for the real LLM?

(I don't think that OpenAI inferencing service would be running OPEA specific DocSum wrapper for LLM...)

I.e. I'm wondering whether the external LLM end point should go to DocSum "llm-uservice" instead of the DocSum megaservice, as megaservice is going to use OPEA specific /v1/docsum endpoint for the service, see: https://github.com/opea-project/GenAIExamples/blob/main/DocSum/docsum.py#L208

PS. I have also same question for FaqGen variant of ChatQnA, as that also seems to require the wrapper service, which was similarly disabled in #993.

As ChatQnA megaservice will try to use OPEA specific /v1/faqgen endpoint for the FAQGen LLM service, see: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/chatqna.py#L375

devpramod · 2025-08-12T15:50:20Z

Hi @eero-t In my testings I have been able to successfully disable the llm-uservice and get a response from /v1/chatqna, /v1/docsum and /v1/codegen directly.

eero-t · 2025-08-13T09:18:27Z

Hi @eero-t In my testings I have been able to successfully disable the llm-uservice and get a response from /v1/chatqna, /v1/docsum and /v1/codegen directly.

Do you mean that you're running also the wrapper service externally, not just the inferencing service?

PS. This is relevant because e.g. OPEA Enterprise Inferencing project provides just inferencing services: https://github.com/opea-project/Enterprise-Inference/tree/main/core/helm-charts

And because this PR does not document that one would need to run the wrapper service externally.

devpramod · 2025-08-13T22:30:54Z

Hi @eero-t
I am running it something like:
helm install docsum docsum --set global.HF_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-uservice.LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME}

Then port-forwarding kubectl port-forward svc/docsum 8888:8888 to test the application.

In practice, I'm using an nginx ingress to map requests to svc/docsum, which is set up in the nginx config at /v1/docsum.

eero-t

I am running it something like: helm install docsum docsum --set global.HF_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-uservice.LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME}

You're running the whole OPEA DocSum application at the external / remote end, to provide remote llm-uservice?

Then port-forwarding kubectl port-forward svc/docsum 8888:8888 to test the application.

Err... I assume you mean to the local DocSum instance, not to the external/remote one?

I discussed this with Sakari, and supporting enterprise inferencing services (i.e. OpenAI APIs instead of OPEA wrapper services) can be done in separate PR.

However, needing external end point to be the wrapper service instead of LLM itself, means that this PR needs several changes:

Example host names: "llm-server" -> "llm-uservice"
Comments: "OpenAI LLM" -> "OPEA DocSum LLM wrapper (llm-uservice)"
Drop KubeAI example as it provides just standard OpenAI LLM API, not DocSum LLM wrapper one
Drop OPENAI_API_KEY variable as it's not relevant for this

=> it may be easier to change DocSum llm-uservice service to use external OpenAI LLM, but you need to test that it works.

louie-tsai · 2025-08-14T18:05:27Z

@louie-tsai need to review it.

Signed-off-by: devpramod <[email protected]>

Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>

Signed-off-by: devpramod <[email protected]>

devpramod · 2025-08-15T15:19:45Z

@eero-t You're absolutely right, and I apologize for the oversight. I initially thought DocSum could work like ChatQnA (which calls /v1/chat/completions directly), but I now understand that DocSum specifically requires the
/v1/docsum endpoint that only llm-uservice provides.

The original implementation was misleading - it suggested users could connect directly to external LLMs when they really needed another llm-uservice instance.

I've updated the PR to follow your suggested approach:

Keep llm-uservice enabled (as the required wrapper for /v1/docsum)
Configure llm-uservice itself to use external LLM endpoints (OpenAI, vLLM, TGI, etc.)
Removed the confusing environment variables from DocSum that it doesn't actually use

This provides the external LLM functionality users want while maintaining the proper OPEA architecture. Thank you for the detailed feedback - it helped clarify the distinction between DocSum's requirements and ChatQnA's end point usage.

helm-charts/docsum/Chart.yaml

eero-t · 2025-08-15T16:33:27Z

helm-charts/docsum/README.md

-
+# To use with external OpenAI-compatible LLM endpoints (OpenAI, vLLM, TGI, etc.)
+# This configures the llm-uservice to connect to external LLM providers while maintaining DocSum compatibility
+# helm install docsum docsum --set global.HF_TOKEN=${HFTOKEN} --values docsum/variant_external-llm-values.yaml --set llm-uservice.env.OPENAI_API_KEY="your-api-key" --set llm-uservice.env.LLM_MODEL_ID="gpt-4-turbo"


shouldn't this also set suitable (OpenAI) LLM_ENDPOINT?

@eero-t Thanks for pointing out. This made me realize I wasn't passing the env variables correctly and that we needed a new env variable in the configmap for OPENAI_API_KEY. Please review.

eero-t · 2025-08-15T17:01:24Z

helm-charts/docsum/variant_external-llm-values.yaml

+    LLM_ENDPOINT: "https://api.openai.com/v1"  # External LLM API endpoint (OpenAI, vLLM, TGI, etc.)
+    OPENAI_API_KEY: "${OPENAI_API_KEY}"        # API key for authentication
+    LLM_MODEL_ID: "gpt-4-turbo"               # Model to use
+    TEXTGEN_BACKEND: "openai"                  # Backend type for OpenAI-compatible endpoints


This is DocSum service, not TextGen.

Also, it looks that giving anything else than TGI or vLLM to the *_BACKEND variables is going to fail Helm install: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/llm-uservice/templates/configmap.yaml

(In TEXTGEN_BACKEND case there seems to be also BEDROCK option, but that does not really help here.)

Looking at the backend code of llm-uservice for the DOCSUM_BACKEND="vLLM" option, it seems like it would be OpenAI API compatible, but unfortunately it used hard-coded openai_api_key value (EMPTY string): https://github.com/opea-project/GenAIComps/blob/main/comps/llms/src/doc-summarization/integrations/vllm.py#L58

The value is hard coded to empty string but the access_token is passed as an authentication header which is taken from the environment variable https://github.com/opea-project/GenAIComps/blob/2444e6984e27dd34aed5f0b341690e046356940d/comps/llms/src/doc-summarization/integrations/vllm.py#L52

@eero-t Based on Sri's comment I have fallen back to docusm backend with vLLM, please review.

eero-t · 2025-08-15T17:05:58Z

Thank you for the detailed feedback - it helped clarify the distinction between DocSum's requirements and ChatQnA's end point usage.

Good that it got cleared, I had started to wonder whether had I grossly misunderstood something! :-)

Note that when ChatQnA application is run as FaqGen service, it will also require llm-uservice. I.e. it has the same issue as DocSum. See e.g:

FaqGen value file settings: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/chatqna/faqgen-gaudi-values.yaml
vLLM backend code for it: https://github.com/opea-project/GenAIComps/blob/main/comps/llms/src/faq-generation/integrations/vllm.py#L55

eero-t · 2025-08-15T17:18:05Z

The OpenAI key missing with vLLM OpenAPI backends should not be a problem for using DocSum / FaqGen with KubeAI... I don't have time to test it though.

chensuyue · 2025-08-20T03:59:27Z

@devpramod does this PR still target v1.4? Please address the left comments.

Signed-off-by: devpramod <[email protected]>

eero-t

Approved, seems OK now.

eero-t · 2025-08-21T09:36:37Z

helm-charts/docsum/README.md

+# To use with external OpenAI-compatible LLM endpoints (OpenAI, vLLM, TGI, etc.)
+# This configures the llm-uservice to connect to external LLM providers while maintaining DocSum compatibility
+# For OpenAI:
+# helm install docsum docsum --values docsum/variant_external-llm-values.yaml --set llm-uservice.OPENAI_API_KEY="your-api-key" --set llm-uservice.LLM_ENDPOINT="https://api.openai.com" --set llm-uservice.LLM_MODEL_ID="gpt-4-turbo"
+# For vLLM/TGI or other OpenAI-compatible endpoints:
+# helm install docsum docsum --values docsum/variant_external-llm-values.yaml --set llm-uservice.LLM_ENDPOINT="http://your-server-url" --set llm-uservice.LLM_MODEL_ID="your-model"


Could you do separate PR adding similar stuff for FaqGen in ChatQnA Helm chart?

(Separate variant file, README instructions etc.)

eero-t · 2025-08-21T09:37:54Z

helm-charts/docsum/variant_external-llm-values.yaml

@@ -0,0 +1,17 @@
+# Copyright (C) 2024 Intel Corporation


Suggested change

# Copyright (C) 2024 Intel Corporation

# Copyright (C) 2025 Intel Corporation

chensuyue · 2025-08-22T01:47:58Z

Please note that the CI/CD machines will be taken back on 8/25, and it is not yet clear when the new machines will be assigned.

devpramod requested review from lianhao and yongfengdu as code owners July 1, 2025 23:34

eero-t reviewed Jul 2, 2025

View reviewed changes

helm-charts/docsum/README.md Outdated Show resolved Hide resolved

lianhao added this to the v1.4 milestone Jul 4, 2025

lianhao requested changes Jul 11, 2025

View reviewed changes

helm-charts/docsum/README.md Outdated Show resolved Hide resolved

helm-charts/docsum/external-llm-values.yaml Outdated Show resolved Hide resolved

lianhao approved these changes Jul 15, 2025

View reviewed changes

eero-t reviewed Jul 18, 2025

View reviewed changes

helm-charts/docsum/values.yaml Show resolved Hide resolved

helm-charts/docsum/values.yaml Outdated Show resolved Hide resolved

helm-charts/docsum/variant_external-llm-values.yaml Outdated Show resolved Hide resolved

helm-charts/docsum/README.md Outdated Show resolved Hide resolved

eero-t reviewed Jul 18, 2025

View reviewed changes

devpramod force-pushed the docsum-external-llm branch from a3bac0a to f35ef30 Compare July 31, 2025 16:42

eero-t requested changes Aug 14, 2025

View reviewed changes

louie-tsai mentioned this pull request Aug 14, 2025

[Feature] Enable remote inference endpoints in helm chats for ChatQnA, CodeGen and DocSum #1055

Closed

devpramod and others added 5 commits August 15, 2025 15:18

Enable docsum to connect with external llm endpoints via helm

db09e2a

Signed-off-by: devpramod <[email protected]>

modify values yaml name

bee11e0

Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>

addressed PR comment

250c574

Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>

change host and port names for external llm env variables

19e1030

Signed-off-by: devpramod <[email protected]>

enable llm-uservice to point to external llm endpoints

d60651f

Signed-off-by: devpramod <[email protected]>

devpramod force-pushed the docsum-external-llm branch from f9f5863 to d60651f Compare August 15, 2025 15:18

eero-t requested changes Aug 15, 2025

View reviewed changes

add env variable for api key in llm-uservice configmap

20db664

Signed-off-by: devpramod <[email protected]>

eero-t approved these changes Aug 21, 2025

View reviewed changes

chensuyue merged commit 9200b37 into opea-project:main Aug 22, 2025
18 of 20 checks passed

	# Copyright (C) 2024 Intel Corporation
	# Copyright (C) 2025 Intel Corporation

Enable docsum to connect with external llm endpoints via helm #1141

Enable docsum to connect with external llm endpoints via helm #1141

Uh oh!

Conversation

devpramod commented Jul 1, 2025

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

eero-t commented Jul 2, 2025

Uh oh!

Uh oh!

lianhao commented Jul 4, 2025

Uh oh!

eero-t commented Jul 4, 2025

Uh oh!

lianhao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

srinarayan-srikanthan commented Jul 14, 2025

Uh oh!

lianhao left a comment

Choose a reason for hiding this comment

Uh oh!

eero-t left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eero-t left a comment

Choose a reason for hiding this comment

Uh oh!

eero-t commented Jul 18, 2025

Uh oh!

eero-t commented Jul 18, 2025

Uh oh!

devpramod commented Jul 31, 2025

Uh oh!

eero-t commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devpramod commented Aug 12, 2025

Uh oh!

eero-t commented Aug 13, 2025

Uh oh!

devpramod commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eero-t left a comment

Choose a reason for hiding this comment

Uh oh!

louie-tsai commented Aug 14, 2025

Uh oh!

devpramod commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eero-t Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

devpramod Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

eero-t Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srinarayan-srikanthan Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

devpramod Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

eero-t commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

eero-t commented Aug 11, 2025 •

edited

Loading

devpramod commented Aug 13, 2025 •

edited

Loading

devpramod commented Aug 15, 2025 •

edited

Loading

eero-t Aug 15, 2025 •

edited

Loading

eero-t commented Aug 15, 2025 •

edited

Loading

chensuyue commented Aug 20, 2025 •

edited

Loading