Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
cb9687d
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm T…
Apr 5, 2025
20db4a2
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters Apr 25, 2025
808788e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters Apr 25, 2025
efe2356
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters Apr 25, 2025
4db2445
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters Apr 25, 2025
39f730e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters Apr 28, 2025
9511da4
Merge branch 'main' of https://github.com/opea-project/GenAIInfra int…
chyundunovDatamonsters May 6, 2025
92d02d2
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 6, 2025
180f16f
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 6, 2025
1298c18
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 13, 2025
70e2f6d
Merge branch 'main' into feature/ChatQnA_k8s
chyundunovDatamonsters May 13, 2025
76d47c2
ChatQnA - Adding files for deploy application on ROCm vLLM and ROCm T…
Apr 5, 2025
7a6380d
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
fe61582
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2025
50d466d
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
26fad2f
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters May 15, 2025
4faa135
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
7e4c5f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2025
e6a5c7f
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
ce469f3
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters May 15, 2025
80dc84c
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
5055b00
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
5e3d4e8
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
7876371
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
2bf5991
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
efb97b4
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
d112b67
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
73ed118
Merge branch 'main' into feature/ChatQnA_k8s
chyundunovDatamonsters May 15, 2025
9dc514e
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
14a4e7a
Merge remote-tracking branch 'origin/feature/ChatQnA_k8s' into featur…
chyundunovDatamonsters May 15, 2025
99fdebc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2025
ed4d6a6
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 15, 2025
bce2798
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 16, 2025
52bc8a7
Adapting ChatQnA applications for deployment in the K8S environment u…
chyundunovDatamonsters May 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions helm-charts/chatqna/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,19 @@ helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --

# To use CPU with vLLM with Qdrant DB
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set vllm.LLM_MODEL_ID=${MODELNAME} -f chatqna/cpu-qdrant-values.yaml
# To use CPU with vLLM with Milvus DB
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set vllm.LLM_MODEL_ID=${MODELNAME} -f chatqna/cpu-milvus-values.yaml
# To use AMD ROCm device with vLLM
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set vllm.LLM_MODEL_ID=${MODELNAME} -f chatqna/rocm-values.yaml
# To use AMD ROCm device with TGI
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set vllm.LLM_MODEL_ID=${MODELNAME} -f chatqna/rocm-tgi-values.yaml

# To deploy FaqGen
#helm install faqgen chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} -f chatqna/faqgen-cpu-values.yaml

# To deploy FaqGen based application on AMD ROCm device with vLLM
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set vllm.LLM_MODEL_ID=${MODELNAME} -f chatqna/faqgen-rocm-values.yaml
# To deploy FaqGen based application on AMD ROCm device with TGI
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set vllm.LLM_MODEL_ID=${MODELNAME} -f chatqna/faqgen-rocm-tgi-values.yaml

```

### IMPORTANT NOTE
Expand Down
50 changes: 50 additions & 0 deletions helm-charts/chatqna/faqgen-rocm-tgi-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

CHATQNA_TYPE: "CHATQNA_FAQGEN"
llm-uservice:
enabled: true
image:
repository: opea/llm-faqgen
LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct
FAQGEN_BACKEND: "TGI"
service:
port: 80
tgi:
enabled: true
accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.4.1-rocm"
MAX_INPUT_LENGTH: "2048"
MAX_TOTAL_TOKENS: "4096"
USE_FLASH_ATTENTION: "false"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0"
MAX_BATCH_SIZE: "4"
extraCmdArgs: [ "--num-shard","1" ]
resources:
limits:
amd.com/gpu: "1"
requests:
cpu: 1
memory: 16Gi
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
capabilities:
add:
- SYS_PTRACE
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
startupProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
vllm:
enabled: false
45 changes: 45 additions & 0 deletions helm-charts/chatqna/faqgen-rocm-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

CHATQNA_TYPE: "CHATQNA_FAQGEN"
llm-uservice:
enabled: true
image:
repository: opea/llm-faqgen
LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct
FAQGEN_BACKEND: "vLLM"
service:
port: 80
tgi:
enabled: false
vllm:
enabled: true
accelDevice: "rocm"
image:
repository: opea/vllm-rocm
tag: latest
env:
HIP_VISIBLE_DEVICES: "0"
TENSOR_PARALLEL_SIZE: "1"
HF_HUB_DISABLE_PROGRESS_BARS: "1"
HF_HUB_ENABLE_HF_TRANSFER: "0"
VLLM_USE_TRITON_FLASH_ATTN: "0"
VLLM_WORKER_MULTIPROC_METHOD: "spawn"
PYTORCH_JIT: "0"
HF_HOME: "/data"
extraCmd:
command: [ "python3", "/workspace/api_server.py" ]
extraCmdArgs: [ "--swap-space", "16",
"--disable-log-requests",
"--dtype", "float16",
"--num-scheduler-steps", "1",
"--distributed-executor-backend", "mp" ]
resources:
limits:
amd.com/gpu: "1"
startupProbe:
failureThreshold: 180
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
51 changes: 51 additions & 0 deletions helm-charts/chatqna/rocm-tgi-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# Accelerate inferencing in heaviest components to improve performance
# by overriding their subchart values

tgi:
enabled: true
accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.4.1-rocm"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
USE_FLASH_ATTENTION: "false"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0"
MAX_BATCH_SIZE: "4"
extraCmdArgs: [ "--num-shard","1" ]
resources:
limits:
amd.com/gpu: "1"
requests:
cpu: 1
memory: 16Gi
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1000
capabilities:
add:
- SYS_PTRACE
drop:
- ALL
seccompProfile:
type: RuntimeDefault
runAsGroup: 0
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
startupProbe:
initialDelaySeconds: 60
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120

vllm:
enabled: false
39 changes: 39 additions & 0 deletions helm-charts/chatqna/rocm-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# Accelerate inferencing in heaviest components to improve performance
# by overriding their subchart values

tgi:
enabled: false
vllm:
enabled: true
accelDevice: "rocm"
image:
repository: opea/vllm-rocm
tag: latest
env:
HIP_VISIBLE_DEVICES: "0"
TENSOR_PARALLEL_SIZE: "1"
HF_HUB_DISABLE_PROGRESS_BARS: "1"
HF_HUB_ENABLE_HF_TRANSFER: "0"
VLLM_USE_TRITON_FLASH_ATTN: "0"
VLLM_WORKER_MULTIPROC_METHOD: "spawn"
PYTORCH_JIT: "0"
HF_HOME: "/data"
extraCmd:
command: [ "python3", "/workspace/api_server.py" ]
extraCmdArgs: [ "--swap-space", "16",
"--disable-log-requests",
"--dtype", "float16",
"--num-scheduler-steps", "1",
"--distributed-executor-backend", "mp" ]
resources:
limits:
amd.com/gpu: "1"
startupProbe:
failureThreshold: 180
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
10 changes: 5 additions & 5 deletions helm-charts/common/tgi/rocm-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,25 @@

accelDevice: "rocm"
image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.4.1-rocm"
repository: opea/tgi-rocm
tag: latest
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
USE_FLASH_ATTENTION: "false"
FLASH_ATTENTION_RECOMPUTE: "false"
HIP_VISIBLE_DEVICES: "0"
MAX_BATCH_SIZE: "4"
extraCmdArgs: ["--num-shard","1"]
tokenizerMountPath: /usr/src/out
resources:
limits:
amd.com/gpu: "1"
requests:
cpu: 1
memory: 16Gi
securityContext:
readOnlyRootFilesystem: false
runAsNonRoot: false
runAsUser: 0
runAsUser: 2000
runAsGroup: 2000
capabilities:
add:
- SYS_PTRACE
Expand Down
2 changes: 1 addition & 1 deletion helm-charts/common/tgi/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ spec:
name: shm
- mountPath: /tmp
name: tmp
- mountPath: /usr/src/out
- mountPath: {{ .Values.tokenizerMountPath | default /usr/src/out }}
name: tokenizer
ports:
- name: http
Expand Down
2 changes: 2 additions & 0 deletions helm-charts/common/tgi/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ startupProbe:
# periodSeconds: 5
# failureThreshold: 120

tokenizerMountPath: /usr/src/out

nodeSelector: {}

tolerations: []
Expand Down
Loading