Skip to content

Conversation

@devpramod
Copy link
Contributor

Description

This PR implements support for using external OpenAI-compatible LLM endpoints with DocSum Helm chart while maintaining backward compatibility with existing configurations.

Motivation
Users need the flexibility to connect to external LLM services (like OpenAI or self-hosted OpenAI-compatible endpoints) instead of always relying on the included LLM components. This adds versatility to our Helm charts without disrupting existing functionality.

Key Changes

  1. Added external-llm-values.yaml configuration file:
  • Provides settings for external LLM endpoints (LLM_SERVER_HOST_IP, LLM_MODEL, OPENAI_API_KEY)
  • Disables internal LLM services (tgi, vllm, llm-uservice) when using external endpoints
  1. Created templates/external-llm-configmap.yaml:
  • Manages environment variables required for external LLM integration
  • Conditionally created only when external LLM is enabled
  1. Updated templates/deployment.yaml:
  • Added conditional logic to use ConfigMap for environment variables when external LLM is enabled
  • Maintained all existing functionality and conditional paths for internal LLM services
  1. Updated Chart.yaml:
  • Added proper conditions for all dependencies to make them optional
  • Enables selective component deployment based on configuration values
  1. Updated README.md:
  • Added example commands showing how to use external LLM endpoints
  • Maintained consistent placeholder values across all services

Issues

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

'n/a'

Tests

Tested backward compatibility to ensure existing helm charts with default values.yaml is working

@eero-t
Copy link
Collaborator

eero-t commented Jul 2, 2025

I guess external test needs the e2e PR to be merged first.

ROCM tests fail to helm: command not found in helm install test phase => @chensuyue ?

DocSum Gaudi values failure is a know bug, for which @lianhao used #1132 as a workaround (with another chart): opea-project/GenAIComps#1719

@lianhao
Copy link
Collaborator

lianhao commented Jul 4, 2025

Please rebase after #1146 is merged, for the default model changes in DocSum

@lianhao lianhao added this to the v1.4 milestone Jul 4, 2025
@eero-t
Copy link
Collaborator

eero-t commented Jul 4, 2025

Please rebase after #1146 is merged, for the default model changes in DocSum

@devpramod Relevant PRs have been merged so this can now be rebased (there are some conflicts with the README changes from the other PRs merged in the meanwhile, which need to handled).

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're waiting for the CI to be resumed, please address these issue first. Thx~

@srinarayan-srikanthan
Copy link

While we're waiting for the CI to be resumed, please address these issue first. Thx~

@lianhao can we merge this, comments addressed.

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM. Let's wait the CI back online before merging this.

Copy link
Collaborator

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not setting this review as change request as I will be OoO and cannot approve fixes to them before code freeze.)

Copy link
Collaborator

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using LLM_SERVICE_HOST as the Helm variable name even if it would mapped to LLM_SERVICE_HOST_IP environment variable (in deployment template), because it's both shorter and more correct (user is unlikely to be using IP numbers with Kubernetes).

=> I'll add PR for doing that change for already merged ChatQnA & CodeGen Helm charts (+ add KubeAI example).

@eero-t
Copy link
Collaborator

eero-t commented Jul 18, 2025

I would suggest using LLM_SERVICE_HOST as the Helm variable name even if it would mapped to LLM_SERVICE_HOST_IP environment variable ...

=> I'll add PR for doing that change for already merged ChatQnA & CodeGen Helm charts (+ add KubeAI example).

Half of GenAIExamples apps use LLM_SERVER_ and half use LLM_SERVICE_ prefix for env variable names (see opea-project/GenAIExamples#2143).

While most of apps in _GenAIInfra have Helm charts belong to latter category, I decided to use the shorter prefix for Helm variable names (LLM_SERVER_HOST / LLM_SERVER_PORT).

=> Please see PR #1166

@eero-t
Copy link
Collaborator

eero-t commented Jul 18, 2025

  • Disables internal LLM services (tgi, vllm, llm-uservice) when using external endpoints

But does DocSum even work without the llm-uservice wrapper service for the real LLM?

(I don't think that OpenAI inferencing service would be running OPEA specific DocSum wrapper for LLM...)

PS. I have also same question for FaqGen variant of ChatQnA, as that also seems to require the wrapper service, which was similarly disabled in #993.

@devpramod devpramod force-pushed the docsum-external-llm branch from a3bac0a to f35ef30 Compare July 31, 2025 16:42
@devpramod
Copy link
Contributor Author

Hi @eero-t I have made all the updates requested.

@eero-t
Copy link
Collaborator

eero-t commented Aug 11, 2025

(back from vacation)

Hi @eero-t I have made all the updates requested.

@devpramod PR looks otherwise fine, but I'm still wondering about this:

Does DocSum work without the llm-uservice wrapper service for the real LLM?

(I don't think that OpenAI inferencing service would be running OPEA specific DocSum wrapper for LLM...)

I.e. I'm wondering whether the external LLM end point should go to DocSum "llm-uservice" instead of the DocSum megaservice, as megaservice is going to use OPEA specific /v1/docsum endpoint for the service, see: https://github.com/opea-project/GenAIExamples/blob/main/DocSum/docsum.py#L208

PS. I have also same question for FaqGen variant of ChatQnA, as that also seems to require the wrapper service, which was similarly disabled in #993.

As ChatQnA megaservice will try to use OPEA specific /v1/faqgen endpoint for the FAQGen LLM service, see: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/chatqna.py#L375

@devpramod
Copy link
Contributor Author

Hi @eero-t In my testings I have been able to successfully disable the llm-uservice and get a response from /v1/chatqna, /v1/docsum and /v1/codegen directly.

@eero-t
Copy link
Collaborator

eero-t commented Aug 13, 2025

Hi @eero-t In my testings I have been able to successfully disable the llm-uservice and get a response from /v1/chatqna, /v1/docsum and /v1/codegen directly.

Do you mean that you're running also the wrapper service externally, not just the inferencing service?

PS. This is relevant because e.g. OPEA Enterprise Inferencing project provides just inferencing services: https://github.com/opea-project/Enterprise-Inference/tree/main/core/helm-charts

And because this PR does not document that one would need to run the wrapper service externally.

@devpramod
Copy link
Contributor Author

devpramod commented Aug 13, 2025

Hi @eero-t
I am running it something like:
helm install docsum docsum --set global.HF_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-uservice.LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME}

Then port-forwarding kubectl port-forward svc/docsum 8888:8888 to test the application.

In practice, I'm using an nginx ingress to map requests to svc/docsum, which is set up in the nginx config at /v1/docsum.

Copy link
Collaborator

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am running it something like: helm install docsum docsum --set global.HF_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-uservice.LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME}

You're running the whole OPEA DocSum application at the external / remote end, to provide remote llm-uservice?

Then port-forwarding kubectl port-forward svc/docsum 8888:8888 to test the application.

Err... I assume you mean to the local DocSum instance, not to the external/remote one?


I discussed this with Sakari, and supporting enterprise inferencing services (i.e. OpenAI APIs instead of OPEA wrapper services) can be done in separate PR.

However, needing external end point to be the wrapper service instead of LLM itself, means that this PR needs several changes:

  • Example host names: "llm-server" -> "llm-uservice"
  • Comments: "OpenAI LLM" -> "OPEA DocSum LLM wrapper (llm-uservice)"
  • Drop KubeAI example as it provides just standard OpenAI LLM API, not DocSum LLM wrapper one
  • Drop OPENAI_API_KEY variable as it's not relevant for this

=> it may be easier to change DocSum llm-uservice service to use external OpenAI LLM, but you need to test that it works.

@louie-tsai
Copy link
Contributor

@louie-tsai need to review it.

devpramod and others added 5 commits August 15, 2025 15:18
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
@devpramod devpramod force-pushed the docsum-external-llm branch from f9f5863 to d60651f Compare August 15, 2025 15:18
@devpramod
Copy link
Contributor Author

devpramod commented Aug 15, 2025

@eero-t You're absolutely right, and I apologize for the oversight. I initially thought DocSum could work like ChatQnA (which calls /v1/chat/completions directly), but I now understand that DocSum specifically requires the
/v1/docsum endpoint that only llm-uservice provides.

The original implementation was misleading - it suggested users could connect directly to external LLMs when they really needed another llm-uservice instance.

I've updated the PR to follow your suggested approach:

  • Keep llm-uservice enabled (as the required wrapper for /v1/docsum)
  • Configure llm-uservice itself to use external LLM endpoints (OpenAI, vLLM, TGI, etc.)
  • Removed the confusing environment variables from DocSum that it doesn't actually use

This provides the external LLM functionality users want while maintaining the proper OPEA architecture. Thank you for the detailed feedback - it helped clarify the distinction between DocSum's requirements and ChatQnA's end point usage.


# To use with external OpenAI-compatible LLM endpoints (OpenAI, vLLM, TGI, etc.)
# This configures the llm-uservice to connect to external LLM providers while maintaining DocSum compatibility
# helm install docsum docsum --set global.HF_TOKEN=${HFTOKEN} --values docsum/variant_external-llm-values.yaml --set llm-uservice.env.OPENAI_API_KEY="your-api-key" --set llm-uservice.env.LLM_MODEL_ID="gpt-4-turbo"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this also set suitable (OpenAI) LLM_ENDPOINT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eero-t Thanks for pointing out. This made me realize I wasn't passing the env variables correctly and that we needed a new env variable in the configmap for OPENAI_API_KEY. Please review.

LLM_ENDPOINT: "https://api.openai.com/v1" # External LLM API endpoint (OpenAI, vLLM, TGI, etc.)
OPENAI_API_KEY: "${OPENAI_API_KEY}" # API key for authentication
LLM_MODEL_ID: "gpt-4-turbo" # Model to use
TEXTGEN_BACKEND: "openai" # Backend type for OpenAI-compatible endpoints
Copy link
Collaborator

@eero-t eero-t Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is DocSum service, not TextGen.

Also, it looks that giving anything else than TGI or vLLM to the *_BACKEND variables is going to fail Helm install: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/llm-uservice/templates/configmap.yaml

(In TEXTGEN_BACKEND case there seems to be also BEDROCK option, but that does not really help here.)

Looking at the backend code of llm-uservice for the DOCSUM_BACKEND="vLLM" option, it seems like it would be OpenAI API compatible, but unfortunately it used hard-coded openai_api_key value (EMPTY string): https://github.com/opea-project/GenAIComps/blob/main/comps/llms/src/doc-summarization/integrations/vllm.py#L58

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is hard coded to empty string but the access_token is passed as an authentication header which is taken from the environment variable https://github.com/opea-project/GenAIComps/blob/2444e6984e27dd34aed5f0b341690e046356940d/comps/llms/src/doc-summarization/integrations/vllm.py#L52

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eero-t Based on Sri's comment I have fallen back to docusm backend with vLLM, please review.

@eero-t
Copy link
Collaborator

eero-t commented Aug 15, 2025

Thank you for the detailed feedback - it helped clarify the distinction between DocSum's requirements and ChatQnA's end point usage.

Good that it got cleared, I had started to wonder whether had I grossly misunderstood something! :-)

Note that when ChatQnA application is run as FaqGen service, it will also require llm-uservice. I.e. it has the same issue as DocSum. See e.g:

@eero-t
Copy link
Collaborator

eero-t commented Aug 15, 2025

The OpenAI key missing with vLLM OpenAPI backends should not be a problem for using DocSum / FaqGen with KubeAI... I don't have time to test it though.

@chensuyue
Copy link
Collaborator

chensuyue commented Aug 20, 2025

@devpramod does this PR still target v1.4? Please address the left comments.

Copy link
Collaborator

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, seems OK now.

Comment on lines +29 to +34
# To use with external OpenAI-compatible LLM endpoints (OpenAI, vLLM, TGI, etc.)
# This configures the llm-uservice to connect to external LLM providers while maintaining DocSum compatibility
# For OpenAI:
# helm install docsum docsum --values docsum/variant_external-llm-values.yaml --set llm-uservice.OPENAI_API_KEY="your-api-key" --set llm-uservice.LLM_ENDPOINT="https://api.openai.com" --set llm-uservice.LLM_MODEL_ID="gpt-4-turbo"
# For vLLM/TGI or other OpenAI-compatible endpoints:
# helm install docsum docsum --values docsum/variant_external-llm-values.yaml --set llm-uservice.LLM_ENDPOINT="http://your-server-url" --set llm-uservice.LLM_MODEL_ID="your-model"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do separate PR adding similar stuff for FaqGen in ChatQnA Helm chart?

(Separate variant file, README instructions etc.)

@@ -0,0 +1,17 @@
# Copyright (C) 2024 Intel Corporation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2025 Intel Corporation

@chensuyue
Copy link
Collaborator

Please note that the CI/CD machines will be taken back on 8/25, and it is not yet clear when the new machines will be assigned.

@chensuyue chensuyue merged commit 9200b37 into opea-project:main Aug 22, 2025
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants