Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
cb9687d
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm T…
Apr 5, 2025
60cd5ee
Adapting AgentQnA applications for deployment in the K8S environment …
Apr 10, 2025
b7e16ab
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters Apr 25, 2025
ea9d079
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters Apr 25, 2025
6a22c66
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters Apr 25, 2025
983e592
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 25, 2025
72693c3
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters Apr 25, 2025
fdbb4d9
Merge remote-tracking branch 'origin/feature/AgentQnA_k8s' into featu…
chyundunovDatamonsters Apr 25, 2025
b5a7cd5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 25, 2025
20db4a2
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters Apr 25, 2025
808788e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters Apr 25, 2025
efe2356
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters Apr 25, 2025
4db2445
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters Apr 25, 2025
39f730e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters Apr 28, 2025
9511da4
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters May 6, 2025
92d02d2
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 6, 2025
180f16f
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 6, 2025
1298c18
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 13, 2025
70e2f6d
Merge branch 'main' into feature/ChatQnA_k8s
chyundunovDatamonsters May 13, 2025
76d47c2
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm T…
Apr 5, 2025
7a6380d
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
fe61582
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2025
50d466d
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
26fad2f
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters May 15, 2025
4faa135
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
7e4c5f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2025
e6a5c7f
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
ce469f3
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters May 15, 2025
80dc84c
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
5055b00
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
5e3d4e8
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
7876371
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
2bf5991
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
efb97b4
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
d112b67
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
73ed118
Merge branch 'main' into feature/ChatQnA_k8s
chyundunovDatamonsters May 15, 2025
9dc514e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
14a4e7a
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters May 15, 2025
99fdebc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2025
7dffc9b
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters May 15, 2025
af9238c
Merge branch 'feature/ChatQnA_k8s' of https://github.com/chyundunovDa…
chyundunovDatamonsters May 15, 2025
b6af376
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters May 15, 2025
7e57ac4
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
e8a0553
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
e9bcf29
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters May 15, 2025
ddc79c3
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters May 15, 2025
774f273
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters May 15, 2025
7719383
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2025
6f4d0a1
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters May 15, 2025
1641f76
Merge remote-tracking branch 'origin/feature/AgentQnA_k8s' into featu…
chyundunovDatamonsters May 15, 2025
33d93de
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters May 15, 2025
66652c7
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters May 16, 2025
327c1d3
Adapting AgentQnA applications for deployment in the K8S environment …
chyundunovDatamonsters May 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion helm-charts/agentqna/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,18 @@ If you want to try with latest version, use `helm pull oci://ghcr.io/opea-projec
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
helm pull oci://ghcr.io/opea-project/charts/agentqna --untar
helm install agentqna agentqna -f agentqna/gaudi-values.yaml --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}

# To use AMD ROCm device
cd GenAIInfra/helm-charts/
./update_dependency.sh
helm dependency update agentqna
export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
# with vLLM
helm upgrade --install agentqna agentqna -f agentqna/rocm-values.yaml --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}

# with TGI
helm upgrade --install agentqna agentqna -f agentqna/rocm-tgi-values.yaml --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
```

## Verify
Expand Down Expand Up @@ -81,5 +93,5 @@ Open another terminal and run the following command to verify the service if wor
curl http://localhost:9090/v1/chat/completions \
-X POST \
-H "Content-Type: application/json" \
-d '{"messages": "How many albums does Iron Maiden have?"}'
-d '{"model": "meta-llama/Llama-3.3-70B-Instruct","messages": "How many albums does Iron Maiden have?"}'
```
57 changes: 57 additions & 0 deletions helm-charts/agentqna/rocm-tgi-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# Accelerate inferencing in heaviest components to improve performance
# by overriding their subchart values
vllm:
enabled: false
tgi:
enabled: true
accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "3.0.0-rocm"
LLM_MODEL_ID: meta-llama/Llama-3.3-70B-Instruct
MAX_INPUT_LENGTH: "2048"
MAX_TOTAL_TOKENS: "4096"
PYTORCH_TUNABLEOP_ENABLED: "0"
USE_FLASH_ATTENTION: "true"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0,1"
MAX_BATCH_SIZE: "4"
extraCmdArgs: [ "--num-shard","2" ]
resources:
limits:
amd.com/gpu: "2"
requests:
cpu: 1
memory: 16Gi
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
capabilities:
add:
- SYS_PTRACE
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
startupProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
supervisor:
llm_endpoint_url: http://{{ .Release.Name }}-tgi
llm_engine: tgi
model: "meta-llama/Llama-3.3-70B-Instruct"
ragagent:
llm_endpoint_url: http://{{ .Release.Name }}-tgi
llm_engine: tgi
model: "meta-llama/Llama-3.3-70B-Instruct"
sqlagent:
llm_endpoint_url: http://{{ .Release.Name }}-tgi
llm_engine: tgi
model: "meta-llama/Llama-3.3-70B-Instruct"
52 changes: 52 additions & 0 deletions helm-charts/agentqna/rocm-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# Accelerate inferencing in heaviest components to improve performance
# by overriding their subchart values

tgi:
enabled: false
vllm:
enabled: true
accelDevice: "rocm"
image:
repository: opea/vllm-rocm
tag: latest
env:
LLM_MODEL_ID: meta-llama/Llama-3.3-70B-Instruct
HIP_VISIBLE_DEVICES: "0,1"
TENSOR_PARALLEL_SIZE: "2"
HF_HUB_DISABLE_PROGRESS_BARS: "1"
HF_HUB_ENABLE_HF_TRANSFER: "0"
VLLM_USE_TRITON_FLASH_ATTN: "0"
VLLM_WORKER_MULTIPROC_METHOD: "spawn"
PYTORCH_JIT: "0"
HF_HOME: "/data"
extraCmd:
command: [ "python3", "/workspace/api_server.py" ]
extraCmdArgs: [ "--swap-space", "16",
"--disable-log-requests",
"--dtype", "float16",
"--num-scheduler-steps", "1",
"--distributed-executor-backend", "mp" ]
resources:
limits:
amd.com/gpu: "2"
startupProbe:
failureThreshold: 180
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
supervisor:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Llama-3.3-70B-Instruct"
ragagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Llama-3.3-70B-Instruct"
sqlagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Llama-3.3-70B-Instruct"
7 changes: 4 additions & 3 deletions helm-charts/common/tgi/rocm-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.4.1-rocm"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
tag: "3.0.0-rocm"
MAX_INPUT_LENGTH: "2048"
MAX_TOTAL_TOKENS: "4096"
PYTORCH_TUNABLEOP_ENABLED: "0"
USE_FLASH_ATTENTION: "false"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0"
Expand Down
3 changes: 3 additions & 0 deletions helm-charts/common/tgi/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,6 @@ data:
{{- if .Values.MAX_BATCH_SIZE }}
MAX_BATCH_SIZE: {{ .Values.MAX_BATCH_SIZE | quote }}
{{- end }}
{{- if .Values.PYTORCH_TUNABLEOP_ENABLED }}
PYTORCH_TUNABLEOP_ENABLED: {{ .Values.PYTORCH_TUNABLEOP_ENABLED | quote }}
{{- end }}
2 changes: 2 additions & 0 deletions helm-charts/valuefiles.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ agentqna:
values:
- cpu-values.yaml
- gaudi-values.yaml
- rocm-values.yaml
- rocm-tgi-values.yaml
audioqna:
src_repo: GenAIInfra
src_dir: helm-charts/audioqna
Expand Down