-
Notifications
You must be signed in to change notification settings - Fork 339
CodeGen/CodeTrans - Adding files to deploy an application in the K8S environment using Helm #1792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 35 commits
cf60682
1fd1de1
bd2d47e
2459ecb
4d35065
6d5049d
9dfbdc5
a8857ae
5a38b26
0e2ef94
30071db
0757dec
9aaf378
9cf4b6e
8e89787
a117c69
82e675c
d2717ae
cf9b048
584f4fd
742acd6
9cd726e
8b46bf4
e93bd62
07e838e
adbb079
2b02f6a
061a646
5116ecb
849d8a1
b61b824
34382be
87a0169
4bb240f
c38d6e3
f34ac3b
03b12e3
e56fac1
3c46038
14f2c1d
d93f017
1e654ee
4d8db18
c628cf5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| # Copyright (c) 2025 Advanced Micro Devices, Inc. | ||
|
|
||
|
|
||
| tgi: | ||
| enabled: true | ||
| accelDevice: "rocm" | ||
| image: | ||
| repository: ghcr.io/huggingface/text-generation-inference | ||
| tag: "2.4.1-rocm" | ||
| LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct" | ||
| MAX_INPUT_LENGTH: "1024" | ||
| MAX_TOTAL_TOKENS: "2048" | ||
| USE_FLASH_ATTENTION: "false" | ||
| FLASH_ATTENTION_RECOMPUTE: "false" | ||
| HIP_VISIBLE_DEVICES: "0" | ||
| MAX_BATCH_SIZE: "4" | ||
| extraCmdArgs: [ "--num-shard","1" ] | ||
| resources: | ||
| limits: | ||
| amd.com/gpu: "1" | ||
| requests: | ||
| cpu: 1 | ||
| memory: 16Gi | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
| runAsNonRoot: false | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So here it's ok to keep run with root? Why chatqna is special? https://github.com/opea-project/GenAIInfra/pull/949/files/180f16fb65570968a44663d0490c42ed539862b0#diff-f93551169c7cda08f51cb91abe0a36eb96356b53ace54c5fd940d24d5d4264acR29
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have a PR to GenAIInfra?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/codetrans/rocm-values.yaml The CodeGen/CodeTrans PRs for GenAIInfra had been merged.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Changes to the launch from an unprivileged user will be made after this PR is completed - opea-project/GenAIComps#1638 |
||
| runAsUser: 0 | ||
| capabilities: | ||
| add: | ||
| - SYS_PTRACE | ||
| readinessProbe: | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 1 | ||
| failureThreshold: 120 | ||
| startupProbe: | ||
| initialDelaySeconds: 60 | ||
| periodSeconds: 5 | ||
| timeoutSeconds: 1 | ||
| failureThreshold: 120 | ||
| vllm: | ||
| enabled: false | ||
| llm-uservice: | ||
| TEXTGEN_BACKEND: TGI | ||
| LLM_MODEL_ID: "Qwen/Qwen2.5-Coder-7B-Instruct" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # Copyright (c) 2025 Advanced Micro Devices, Inc. | ||
|
|
||
|
|
||
| tgi: | ||
| enabled: false | ||
|
|
||
| vllm: | ||
| enabled: true | ||
| accelDevice: "rocm" | ||
| image: | ||
| repository: opea/vllm-rocm | ||
| tag: latest | ||
| env: | ||
| HIP_VISIBLE_DEVICES: "0" | ||
| TENSOR_PARALLEL_SIZE: "1" | ||
| HF_HUB_DISABLE_PROGRESS_BARS: "1" | ||
| HF_HUB_ENABLE_HF_TRANSFER: "0" | ||
| VLLM_USE_TRITON_FLASH_ATTN: "0" | ||
| VLLM_WORKER_MULTIPROC_METHOD: "spawn" | ||
| PYTORCH_JIT: "0" | ||
| HF_HOME: "/data" | ||
| extraCmd: | ||
| command: [ "python3", "/workspace/api_server.py" ] | ||
| extraCmdArgs: [ "--swap-space", "16", | ||
| "--disable-log-requests", | ||
| "--dtype", "float16", | ||
| "--num-scheduler-steps", "1", | ||
| "--distributed-executor-backend", "mp" ] | ||
| resources: | ||
| limits: | ||
| amd.com/gpu: "1" | ||
| startupProbe: | ||
| failureThreshold: 180 | ||
| securityContext: | ||
| readOnlyRootFilesystem: false | ||
| runAsNonRoot: false | ||
| runAsUser: 0 | ||
|
|
||
| llm-uservice: | ||
| TEXTGEN_BACKEND: vLLM | ||
| retryTimeoutSeconds: 720 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,3 +16,150 @@ helm install codetrans oci://ghcr.io/opea-project/charts/codetrans --set global | |
| export HFTOKEN="insert-your-huggingface-token-here" | ||
| helm install codetrans oci://ghcr.io/opea-project/charts/codetrans --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml | ||
| ``` | ||
|
|
||
| ## Deploy on AMD ROCm using Helm charts from the binary Helm repository | ||
|
|
||
| ```bash | ||
| mkdir ~/codetrans-k8s-install && cd ~/codetrans-k8s-install | ||
| ``` | ||
|
|
||
| ### Cloning repos | ||
|
|
||
| ```bash | ||
| git clone https://github.com/opea-project/GenAIExamples.git | ||
| ``` | ||
|
|
||
| ### Go to the installation directory | ||
|
|
||
| ```bash | ||
| cd GenAIExamples/CodeTrans/kubernetes/helm | ||
| ``` | ||
|
|
||
| ### Settings system variables | ||
|
|
||
| ```bash | ||
| export HFTOKEN="your_huggingface_token" | ||
| export MODELDIR="/mnt/opea-models" | ||
| export MODELNAME="mistralai/Mistral-7B-Instruct-v0.3" | ||
| ``` | ||
|
|
||
| ### Setting variables in Values files | ||
|
|
||
| #### If ROCm vLLM used | ||
| ```bash | ||
| nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml | ||
| ``` | ||
|
|
||
| - HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
| You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
| - TENSOR_PARALLEL_SIZE - must match the number of GPUs used | ||
| - resources: | ||
| limits: | ||
| amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
|
||
| #### If ROCm TGI used | ||
|
|
||
| ```bash | ||
| nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml | ||
| ``` | ||
|
|
||
| - HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
| You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
| - extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used | ||
| - resources: | ||
| limits: | ||
| amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
|
||
| ### Installing the Helm Chart | ||
|
|
||
| #### If ROCm vLLM used | ||
| ```bash | ||
| helm upgrade --install codetrans oci://ghcr.io/opea-project/charts/codetrans \ | ||
| --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
| --values rocm-values.yaml | ||
| ``` | ||
|
|
||
| #### If ROCm TGI used | ||
| ```bash | ||
| helm upgrade --install codetrans oci://ghcr.io/opea-project/charts/codetrans \ | ||
| --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
| --values rocm-tgi-values.yaml | ||
| ``` | ||
|
|
||
| ## Deploy on AMD ROCm using Helm charts from Git repositories | ||
|
|
||
| ### Creating working dirs | ||
|
|
||
| ```bash | ||
| mkdir ~/codetrans-k8s-install && cd ~/codetrans-k8s-install | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once you created the directory, cd to it. All the other paths then get shorter, do not need to reference ~/codetrans-k8s-install everywhere |
||
| ``` | ||
|
|
||
| ### Cloning repos | ||
|
|
||
| ```bash | ||
| git clone https://github.com/opea-project/GenAIExamples.git | ||
| git clone https://github.com/opea-project/GenAIInfra.git | ||
| ``` | ||
|
|
||
| ### Go to the installation directory | ||
|
|
||
| ```bash | ||
| cd GenAIExamples/CodeGen/kubernetes/helm | ||
| ``` | ||
|
|
||
| ### Settings system variables | ||
|
|
||
| ```bash | ||
| export HFTOKEN="your_huggingface_token" | ||
| export MODELDIR="/mnt/opea-models" | ||
| export MODELNAME="mistralai/Mistral-7B-Instruct-v0.3" | ||
| ``` | ||
|
|
||
| ### Setting variables in Values files | ||
|
|
||
| #### If ROCm vLLM used | ||
| ```bash | ||
| nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml | ||
| ``` | ||
|
|
||
| - HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
| You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
| - TENSOR_PARALLEL_SIZE - must match the number of GPUs used | ||
| - resources: | ||
| limits: | ||
| amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
|
||
| #### If ROCm TGI used | ||
|
|
||
| ```bash | ||
| nano ~/codetrans-k8s-install/GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml | ||
| ``` | ||
|
|
||
| - HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
| You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
| - extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used | ||
| - resources: | ||
| limits: | ||
| amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
|
||
| ### Installing the Helm Chart | ||
|
|
||
| #### If ROCm vLLM used | ||
| ```bash | ||
| cd ~/codetrans-k8s-install/GenAIInfra/helm-charts | ||
| ./update_dependency.sh | ||
| helm dependency update codetrans | ||
| helm upgrade --install codetrans codetrans \ | ||
| --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
| --values ../../GenAIExamples/CodeTrans/kubernetes/helm/rocm-values.yaml | ||
| ``` | ||
|
|
||
| #### If ROCm TGI used | ||
| ```bash | ||
| cd ~/codetrans-k8s-install/GenAIInfra/helm-charts | ||
| ./update_dependency.sh | ||
| helm dependency update codetrans | ||
| helm upgrade --install codetrans codetrans \ | ||
| --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
| --values ../../GenAIExamples/CodeTrans/kubernetes/helm/rocm-tgi-values.yaml | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.