-
Notifications
You must be signed in to change notification settings - Fork 98
Adapting AgentQnA applications for deployment in the K8S environment using AMD GPU using Helm #975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
chensuyue
merged 53 commits into
opea-project:main
from
chyundunovDatamonsters:feature/AgentQnA_k8s
May 19, 2025
Merged
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit
Hold shift + click to select a range
cb9687d
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm T…
60cd5ee
Adapting AgentQnA applications for deployment in the K8S environment …
b7e16ab
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters ea9d079
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters 6a22c66
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters 983e592
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 72693c3
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters fdbb4d9
Merge remote-tracking branch 'origin/feature/AgentQnA_k8s' into featu…
chyundunovDatamonsters b5a7cd5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 20db4a2
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 808788e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters efe2356
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 4db2445
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters 39f730e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 9511da4
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters 92d02d2
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 180f16f
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 1298c18
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 70e2f6d
Merge branch 'main' into feature/ChatQnA_k8s
chyundunovDatamonsters 76d47c2
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm T…
7a6380d
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters fe61582
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 50d466d
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 26fad2f
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters 4faa135
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 7e4c5f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] e6a5c7f
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters ce469f3
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters 80dc84c
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 5055b00
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 5e3d4e8
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 7876371
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 2bf5991
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters efb97b4
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters d112b67
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 73ed118
Merge branch 'main' into feature/ChatQnA_k8s
chyundunovDatamonsters 9dc514e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters 14a4e7a
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters 99fdebc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7dffc9b
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters af9238c
Merge branch 'feature/ChatQnA_k8s' of https://github.com/chyundunovDa…
chyundunovDatamonsters b6af376
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters 7e57ac4
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters e8a0553
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters e9bcf29
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters ddc79c3
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters 774f273
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters 7719383
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6f4d0a1
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters 1641f76
Merge remote-tracking branch 'origin/feature/AgentQnA_k8s' into featu…
chyundunovDatamonsters 33d93de
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters 66652c7
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters 327c1d3
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| # Copyright (C) 2024 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Accelerate inferencing in heaviest components to improve performance | ||
| # by overriding their subchart values | ||
| vllm: | ||
| enabled: false | ||
| tgi: | ||
| enabled: true | ||
| accelDevice: "rocm" | ||
| image: | ||
| repository: ghcr.io/huggingface/text-generation-inference | ||
| tag: "3.0.0-rocm" | ||
| LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct" | ||
| MAX_INPUT_LENGTH: "1024" | ||
| MAX_TOTAL_TOKENS: "2048" | ||
| USE_FLASH_ATTENTION: "false" | ||
| FLASH_ATTENTION_RECOMPUTE: "false" | ||
| HIP_VISIBLE_DEVICES: "0,1" | ||
| MAX_BATCH_SIZE: "4" | ||
| extraCmdArgs: [ "--num-shard","2" ] | ||
| resources: | ||
| limits: | ||
| amd.com/gpu: "2" | ||
| requests: | ||
| cpu: 1 | ||
| memory: 16Gi | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
| runAsNonRoot: false | ||
| runAsUser: 0 | ||
| capabilities: | ||
| add: | ||
| - SYS_PTRACE | ||
| readinessProbe: | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 1 | ||
| failureThreshold: 120 | ||
| startupProbe: | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 1 | ||
| failureThreshold: 120 | ||
| supervisor: | ||
| llm_endpoint_url: http://{{ .Release.Name }}-tgi | ||
| llm_engine: tgi | ||
| model: "meta-llama/Llama-3.3-70B-Instruct" | ||
| ragagent: | ||
| llm_endpoint_url: http://{{ .Release.Name }}-tgi | ||
| llm_engine: tgi | ||
| model: "meta-llama/Llama-3.3-70B-Instruct" | ||
| sqlagent: | ||
| llm_endpoint_url: http://{{ .Release.Name }}-tgi | ||
| llm_engine: tgi | ||
| model: "meta-llama/Llama-3.3-70B-Instruct" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| # Copyright (C) 2024 Intel Corporation | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Accelerate inferencing in heaviest components to improve performance | ||
| # by overriding their subchart values | ||
|
|
||
| tgi: | ||
| enabled: false | ||
| vllm: | ||
| enabled: true | ||
| accelDevice: "rocm" | ||
| image: | ||
| repository: opea/vllm-rocm | ||
| tag: latest | ||
| LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct" | ||
| env: | ||
| HIP_VISIBLE_DEVICES: "0,1" | ||
| TENSOR_PARALLEL_SIZE: "2" | ||
| HF_HUB_DISABLE_PROGRESS_BARS: "1" | ||
| HF_HUB_ENABLE_HF_TRANSFER: "0" | ||
| VLLM_USE_TRITON_FLASH_ATTN: "0" | ||
| VLLM_WORKER_MULTIPROC_METHOD: "spawn" | ||
| PYTORCH_JIT: "0" | ||
| HF_HOME: "/data" | ||
| extraCmd: | ||
| command: [ "python3", "/workspace/api_server.py" ] | ||
| extraCmdArgs: [ "--swap-space", "16", | ||
| "--disable-log-requests", | ||
| "--dtype", "float16", | ||
| "--num-scheduler-steps", "1", | ||
| "--distributed-executor-backend", "mp" ] | ||
| resources: | ||
| limits: | ||
| amd.com/gpu: "2" | ||
| startupProbe: | ||
| failureThreshold: 180 | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
| runAsNonRoot: false | ||
| runAsUser: 0 | ||
| supervisor: | ||
| llm_endpoint_url: http://{{ .Release.Name }}-vllm | ||
| llm_engine: vllm | ||
| model: "meta-llama/Llama-3.3-70B-Instruct" | ||
| ragagent: | ||
| llm_endpoint_url: http://{{ .Release.Name }}-vllm | ||
| llm_engine: vllm | ||
| model: "meta-llama/Llama-3.3-70B-Instruct" | ||
| sqlagent: | ||
| llm_endpoint_url: http://{{ .Release.Name }}-vllm | ||
| llm_engine: vllm | ||
| model: "meta-llama/Llama-3.3-70B-Instruct" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.