-
Notifications
You must be signed in to change notification settings - Fork 98
Enable docsum to connect with external llm endpoints via helm #1141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I guess external test needs the e2e PR to be merged first. ROCM tests fail to DocSum Gaudi values failure is a know bug, for which @lianhao used #1132 as a workaround (with another chart): opea-project/GenAIComps#1719 |
|
Please rebase after #1146 is merged, for the default model changes in DocSum |
@devpramod Relevant PRs have been merged so this can now be rebased (there are some conflicts with the README changes from the other PRs merged in the meanwhile, which need to handled). |
lianhao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're waiting for the CI to be resumed, please address these issue first. Thx~
@lianhao can we merge this, comments addressed. |
lianhao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM. Let's wait the CI back online before merging this.
eero-t
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not setting this review as change request as I will be OoO and cannot approve fixes to them before code freeze.)
eero-t
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using LLM_SERVICE_HOST as the Helm variable name even if it would mapped to LLM_SERVICE_HOST_IP environment variable (in deployment template), because it's both shorter and more correct (user is unlikely to be using IP numbers with Kubernetes).
=> I'll add PR for doing that change for already merged ChatQnA & CodeGen Helm charts (+ add KubeAI example).
Half of GenAIExamples apps use While most of apps in _GenAIInfra have Helm charts belong to latter category, I decided to use the shorter prefix for Helm variable names ( => Please see PR #1166 |
But does DocSum even work without the (I don't think that OpenAI inferencing service would be running OPEA specific DocSum wrapper for LLM...) PS. I have also same question for FaqGen variant of ChatQnA, as that also seems to require the wrapper service, which was similarly disabled in #993. |
a3bac0a to
f35ef30
Compare
|
Hi @eero-t I have made all the updates requested. |
|
(back from vacation)
@devpramod PR looks otherwise fine, but I'm still wondering about this:
I.e. I'm wondering whether the external LLM end point should go to DocSum "llm-uservice" instead of the DocSum megaservice, as megaservice is going to use OPEA specific
As ChatQnA megaservice will try to use OPEA specific |
|
Hi @eero-t In my testings I have been able to successfully disable the llm-uservice and get a response from /v1/chatqna, /v1/docsum and /v1/codegen directly. |
Do you mean that you're running also the wrapper service externally, not just the inferencing service? PS. This is relevant because e.g. OPEA Enterprise Inferencing project provides just inferencing services: https://github.com/opea-project/Enterprise-Inference/tree/main/core/helm-charts And because this PR does not document that one would need to run the wrapper service externally. |
|
Hi @eero-t Then port-forwarding kubectl port-forward svc/docsum 8888:8888 to test the application. In practice, I'm using an nginx ingress to map requests to svc/docsum, which is set up in the nginx config at /v1/docsum. |
eero-t
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am running it something like: helm install docsum docsum --set global.HF_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-uservice.LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME}
You're running the whole OPEA DocSum application at the external / remote end, to provide remote llm-uservice?
Then port-forwarding kubectl port-forward svc/docsum 8888:8888 to test the application.
Err... I assume you mean to the local DocSum instance, not to the external/remote one?
I discussed this with Sakari, and supporting enterprise inferencing services (i.e. OpenAI APIs instead of OPEA wrapper services) can be done in separate PR.
However, needing external end point to be the wrapper service instead of LLM itself, means that this PR needs several changes:
- Example host names: "llm-server" -> "llm-uservice"
- Comments: "OpenAI LLM" -> "OPEA DocSum LLM wrapper (llm-uservice)"
- Drop KubeAI example as it provides just standard OpenAI LLM API, not DocSum LLM wrapper one
- Drop
OPENAI_API_KEYvariable as it's not relevant for this
=> it may be easier to change DocSum llm-uservice service to use external OpenAI LLM, but you need to test that it works.
|
@louie-tsai need to review it. |
Signed-off-by: devpramod <[email protected]>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: devpramod <[email protected]>
Signed-off-by: devpramod <[email protected]>
f9f5863 to
d60651f
Compare
|
@eero-t You're absolutely right, and I apologize for the oversight. I initially thought DocSum could work like ChatQnA (which calls /v1/chat/completions directly), but I now understand that DocSum specifically requires the The original implementation was misleading - it suggested users could connect directly to external LLMs when they really needed another llm-uservice instance. I've updated the PR to follow your suggested approach:
This provides the external LLM functionality users want while maintaining the proper OPEA architecture. Thank you for the detailed feedback - it helped clarify the distinction between DocSum's requirements and ChatQnA's end point usage. |
helm-charts/docsum/README.md
Outdated
|
|
||
| # To use with external OpenAI-compatible LLM endpoints (OpenAI, vLLM, TGI, etc.) | ||
| # This configures the llm-uservice to connect to external LLM providers while maintaining DocSum compatibility | ||
| # helm install docsum docsum --set global.HF_TOKEN=${HFTOKEN} --values docsum/variant_external-llm-values.yaml --set llm-uservice.env.OPENAI_API_KEY="your-api-key" --set llm-uservice.env.LLM_MODEL_ID="gpt-4-turbo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this also set suitable (OpenAI) LLM_ENDPOINT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eero-t Thanks for pointing out. This made me realize I wasn't passing the env variables correctly and that we needed a new env variable in the configmap for OPENAI_API_KEY. Please review.
| LLM_ENDPOINT: "https://api.openai.com/v1" # External LLM API endpoint (OpenAI, vLLM, TGI, etc.) | ||
| OPENAI_API_KEY: "${OPENAI_API_KEY}" # API key for authentication | ||
| LLM_MODEL_ID: "gpt-4-turbo" # Model to use | ||
| TEXTGEN_BACKEND: "openai" # Backend type for OpenAI-compatible endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is DocSum service, not TextGen.
Also, it looks that giving anything else than TGI or vLLM to the *_BACKEND variables is going to fail Helm install: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/llm-uservice/templates/configmap.yaml
(In TEXTGEN_BACKEND case there seems to be also BEDROCK option, but that does not really help here.)
Looking at the backend code of llm-uservice for the DOCSUM_BACKEND="vLLM" option, it seems like it would be OpenAI API compatible, but unfortunately it used hard-coded openai_api_key value (EMPTY string): https://github.com/opea-project/GenAIComps/blob/main/comps/llms/src/doc-summarization/integrations/vllm.py#L58
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value is hard coded to empty string but the access_token is passed as an authentication header which is taken from the environment variable https://github.com/opea-project/GenAIComps/blob/2444e6984e27dd34aed5f0b341690e046356940d/comps/llms/src/doc-summarization/integrations/vllm.py#L52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eero-t Based on Sri's comment I have fallen back to docusm backend with vLLM, please review.
Good that it got cleared, I had started to wonder whether had I grossly misunderstood something! :-) Note that when ChatQnA application is run as FaqGen service, it will also require
|
|
The OpenAI key missing with vLLM OpenAPI backends should not be a problem for using DocSum / FaqGen with KubeAI... I don't have time to test it though. |
|
@devpramod does this PR still target v1.4? Please address the left comments. |
Signed-off-by: devpramod <[email protected]>
eero-t
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, seems OK now.
| # To use with external OpenAI-compatible LLM endpoints (OpenAI, vLLM, TGI, etc.) | ||
| # This configures the llm-uservice to connect to external LLM providers while maintaining DocSum compatibility | ||
| # For OpenAI: | ||
| # helm install docsum docsum --values docsum/variant_external-llm-values.yaml --set llm-uservice.OPENAI_API_KEY="your-api-key" --set llm-uservice.LLM_ENDPOINT="https://api.openai.com" --set llm-uservice.LLM_MODEL_ID="gpt-4-turbo" | ||
| # For vLLM/TGI or other OpenAI-compatible endpoints: | ||
| # helm install docsum docsum --values docsum/variant_external-llm-values.yaml --set llm-uservice.LLM_ENDPOINT="http://your-server-url" --set llm-uservice.LLM_MODEL_ID="your-model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you do separate PR adding similar stuff for FaqGen in ChatQnA Helm chart?
(Separate variant file, README instructions etc.)
| @@ -0,0 +1,17 @@ | |||
| # Copyright (C) 2024 Intel Corporation | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Copyright (C) 2024 Intel Corporation | |
| # Copyright (C) 2025 Intel Corporation |
|
Please note that the CI/CD machines will be taken back on 8/25, and it is not yet clear when the new machines will be assigned. |
Description
This PR implements support for using external OpenAI-compatible LLM endpoints with DocSum Helm chart while maintaining backward compatibility with existing configurations.
Motivation
Users need the flexibility to connect to external LLM services (like OpenAI or self-hosted OpenAI-compatible endpoints) instead of always relying on the included LLM components. This adds versatility to our Helm charts without disrupting existing functionality.
Key Changes
external-llm-values.yamlconfiguration file:templates/external-llm-configmap.yaml:templates/deployment.yaml:Chart.yaml:README.md:Issues
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
'n/a'
Tests
Tested backward compatibility to ensure existing helm charts with default values.yaml is working