-
Notifications
You must be signed in to change notification settings - Fork 98
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm TGI with Helm #949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lianhao
merged 34 commits into
opea-project:main
from
chyundunovDatamonsters:feature/ChatQnA_k8s
May 16, 2025
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
cb9687d
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm T…
20db4a2
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 808788e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters efe2356
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 4db2445
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters 39f730e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 9511da4
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters 92d02d2
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 180f16f
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 1298c18
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 70e2f6d
Merge branch 'main' into feature/ChatQnA_k8s
chyundunovDatamonsters 76d47c2
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm T…
7a6380d
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters fe61582
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 50d466d
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 26fad2f
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters 4faa135
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 7e4c5f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] e6a5c7f
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters ce469f3
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters 80dc84c
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 5055b00
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 5e3d4e8
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 7876371
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 2bf5991
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters efb97b4
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters d112b67
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 73ed118
Merge branch 'main' into feature/ChatQnA_k8s
chyundunovDatamonsters 9dc514e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 14a4e7a
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters 99fdebc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] ed4d6a6
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters bce2798
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 52bc8a7
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Copyright (C) 2025 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| CHATQNA_TYPE: "CHATQNA_FAQGEN" | ||
| llm-uservice: | ||
| enabled: true | ||
| image: | ||
| repository: opea/llm-faqgen | ||
| LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct | ||
| FAQGEN_BACKEND: "TGI" | ||
| service: | ||
| port: 80 | ||
| tgi: | ||
| enabled: true | ||
| accelDevice: "rocm" | ||
| image: | ||
| repository: ghcr.io/huggingface/text-generation-inference | ||
| tag: "3.0.0-rocm" | ||
| LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct | ||
| MAX_INPUT_LENGTH: "2048" | ||
| MAX_TOTAL_TOKENS: "4096" | ||
| USE_FLASH_ATTENTION: "true" | ||
| FLASH_ATTENTION_RECOMPUTE: "false" | ||
| PYTORCH_TUNABLEOP_ENABLED: "0" | ||
| HIP_VISIBLE_DEVICES: "0,1" | ||
| MAX_BATCH_SIZE: "4" | ||
| extraCmdArgs: [ "--num-shard","2" ] | ||
| resources: | ||
| limits: | ||
| amd.com/gpu: "2" | ||
| requests: | ||
| cpu: 1 | ||
| memory: 16Gi | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
| runAsNonRoot: false | ||
| runAsUser: 0 | ||
| capabilities: | ||
| add: | ||
| - SYS_PTRACE | ||
| readinessProbe: | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 1 | ||
| failureThreshold: 120 | ||
| startupProbe: | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 1 | ||
| failureThreshold: 120 | ||
| vllm: | ||
| enabled: false | ||
|
|
||
| # Reranking: second largest bottleneck when reranking is in use | ||
| # (i.e. query context docs have been uploaded with data-prep) | ||
| # | ||
| # TODO: could vLLM be used also for reranking / embedding? | ||
| teirerank: | ||
| accelDevice: "cpu" | ||
| image: | ||
| repository: ghcr.io/huggingface/text-embeddings-inference | ||
| tag: cpu-1.5 | ||
| # securityContext: | ||
| # readOnlyRootFilesystem: false | ||
chyundunovDatamonsters marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| readinessProbe: | ||
| timeoutSeconds: 1 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| # Copyright (C) 2025 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| CHATQNA_TYPE: "CHATQNA_FAQGEN" | ||
| llm-uservice: | ||
| enabled: true | ||
| image: | ||
| repository: opea/llm-faqgen | ||
| LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct | ||
| FAQGEN_BACKEND: "vLLM" | ||
| service: | ||
| port: 80 | ||
| tgi: | ||
| enabled: false | ||
| vllm: | ||
| enabled: true | ||
| accelDevice: "rocm" | ||
| image: | ||
| repository: opea/vllm-rocm | ||
| tag: latest | ||
| env: | ||
| HIP_VISIBLE_DEVICES: "0" | ||
| TENSOR_PARALLEL_SIZE: "1" | ||
| HF_HUB_DISABLE_PROGRESS_BARS: "1" | ||
| HF_HUB_ENABLE_HF_TRANSFER: "0" | ||
| VLLM_USE_TRITON_FLASH_ATTN: "0" | ||
| VLLM_WORKER_MULTIPROC_METHOD: "spawn" | ||
| PYTORCH_JIT: "0" | ||
| HF_HOME: "/data" | ||
| extraCmd: | ||
| command: [ "python3", "/workspace/api_server.py" ] | ||
| extraCmdArgs: [ "--swap-space", "16", | ||
| "--disable-log-requests", | ||
| "--dtype", "float16", | ||
| "--num-scheduler-steps", "1", | ||
| "--distributed-executor-backend", "mp" ] | ||
| resources: | ||
| limits: | ||
| amd.com/gpu: "1" | ||
| startupProbe: | ||
| failureThreshold: 180 | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
| runAsNonRoot: false | ||
| runAsUser: 0 | ||
|
|
||
| # Reranking: second largest bottleneck when reranking is in use | ||
| # (i.e. query context docs have been uploaded with data-prep) | ||
| # | ||
| # TODO: could vLLM be used also for reranking / embedding? | ||
| teirerank: | ||
| accelDevice: "cpu" | ||
| image: | ||
| repository: ghcr.io/huggingface/text-embeddings-inference | ||
| tag: cpu-1.5 | ||
| # securityContext: | ||
| # readOnlyRootFilesystem: false | ||
chyundunovDatamonsters marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| readinessProbe: | ||
| timeoutSeconds: 1 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Copyright (C) 2024 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Accelerate inferencing in heaviest components to improve performance | ||
| # by overriding their subchart values | ||
|
|
||
| tgi: | ||
| enabled: true | ||
| accelDevice: "rocm" | ||
| image: | ||
| repository: ghcr.io/huggingface/text-generation-inference | ||
| tag: "3.0.0-rocm" | ||
| LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct | ||
| MAX_INPUT_LENGTH: "2048" | ||
| MAX_TOTAL_TOKENS: "4096" | ||
| PYTORCH_TUNABLEOP_ENABLED: "0" | ||
| USE_FLASH_ATTENTION: "true" | ||
| FLASH_ATTENTION_RECOMPUTE: "true" | ||
| HIP_VISIBLE_DEVICES: "0,1" | ||
| MAX_BATCH_SIZE: "4" | ||
| extraCmdArgs: [ "--num-shard","2" ] | ||
| resources: | ||
| limits: | ||
| amd.com/gpu: "2" | ||
| requests: | ||
| cpu: 1 | ||
| memory: 16Gi | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
| runAsNonRoot: false | ||
| runAsUser: 0 | ||
| capabilities: | ||
| add: | ||
| - SYS_PTRACE | ||
| readinessProbe: | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 1 | ||
| failureThreshold: 120 | ||
| startupProbe: | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 1 | ||
| failureThreshold: 120 | ||
|
|
||
| vllm: | ||
| enabled: false | ||
|
|
||
| # Reranking: second largest bottleneck when reranking is in use | ||
| # (i.e. query context docs have been uploaded with data-prep) | ||
| # | ||
| # TODO: could vLLM be used also for reranking / embedding? | ||
| teirerank: | ||
| accelDevice: "cpu" | ||
| image: | ||
| repository: ghcr.io/huggingface/text-embeddings-inference | ||
| tag: cpu-1.5 | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
lianhao marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| readinessProbe: | ||
| timeoutSeconds: 1 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Copyright (C) 2024 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Accelerate inferencing in heaviest components to improve performance | ||
| # by overriding their subchart values | ||
|
|
||
| tgi: | ||
| enabled: false | ||
| vllm: | ||
| enabled: true | ||
| accelDevice: "rocm" | ||
| image: | ||
| repository: opea/vllm-rocm | ||
| tag: latest | ||
| env: | ||
| HIP_VISIBLE_DEVICES: "0" | ||
| TENSOR_PARALLEL_SIZE: "1" | ||
| HF_HUB_DISABLE_PROGRESS_BARS: "1" | ||
| HF_HUB_ENABLE_HF_TRANSFER: "0" | ||
| VLLM_USE_TRITON_FLASH_ATTN: "0" | ||
| VLLM_WORKER_MULTIPROC_METHOD: "spawn" | ||
| PYTORCH_JIT: "0" | ||
| HF_HOME: "/data" | ||
| extraCmd: | ||
| command: [ "python3", "/workspace/api_server.py" ] | ||
| extraCmdArgs: [ "--swap-space", "16", | ||
| "--disable-log-requests", | ||
| "--dtype", "float16", | ||
| "--num-scheduler-steps", "1", | ||
| "--distributed-executor-backend", "mp" ] | ||
| resources: | ||
| limits: | ||
| amd.com/gpu: "1" | ||
| startupProbe: | ||
| failureThreshold: 180 | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
| runAsNonRoot: false | ||
| runAsUser: 0 | ||
|
|
||
| # Reranking: second largest bottleneck when reranking is in use | ||
| # (i.e. query context docs have been uploaded with data-prep) | ||
| # | ||
| # TODO: could vLLM be used also for reranking / embedding? | ||
| teirerank: | ||
| accelDevice: "cpu" | ||
chyundunovDatamonsters marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| image: | ||
| repository: ghcr.io/huggingface/text-embeddings-inference | ||
| tag: cpu-1.5 | ||
chyundunovDatamonsters marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
lianhao marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| readinessProbe: | ||
| timeoutSeconds: 1 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.