Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions helm-charts/audioqna/cpu-multilang-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ tgi:
enabled: false
vllm:
enabled: true
VLLM_CPU_OMP_THREADS_BIND: all

speecht5:
enabled: false
Expand Down
1 change: 1 addition & 0 deletions helm-charts/audioqna/cpu-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ tgi:
enabled: false
vllm:
enabled: true
VLLM_CPU_OMP_THREADS_BIND: all

speecht5:
enabled: true
Expand Down
16 changes: 8 additions & 8 deletions helm-charts/codetrans/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ scripts/update_dependency.sh
helm dependency update codetrans
export HFTOKEN="insert-your-huggingface-token-here"
export MODELDIR="/mnt/opea-models"
export MODELNAME="mistralai/Mistral-7B-Instruct-v0.3"
export MODELNAME="Qwen/Qwen2.5-Coder-7B-Instruct"
# To use CPU with vLLM
helm install codetrans codetrans --set global.HF_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set llm-uservcie.LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME} -f codetrans/cpu-values.yaml
# To use CPU with TGI
Expand All @@ -31,7 +31,7 @@ helm install codetrans codetrans --set global.HF_TOKEN=${HFTOKEN} --set global.m

### IMPORTANT NOTE

1. To use model `mistralai/Mistral-7B-Instruct-v0.3`, you should first goto the [huggingface model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) to apply for the model access first. You need to make sure your huggingface token has at least read access to that model.
1. To use model `Qwen/Qwen2.5-Coder-7B-Instruct`, you should first goto the [huggingface model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) to apply for the model access first. You need to make sure your huggingface token has at least read access to that model.

2. Make sure your `MODELDIR` exists on the node where your workload is schedueled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model.

Expand Down Expand Up @@ -66,9 +66,9 @@ Open a browser to access `http://<k8s-node-ip-address>:${port}` to play with the

## Values

| Key | Type | Default | Description |
| ----------------- | ------ | -------------------------------------- | -------------------------------------------------------------------------------------- |
| image.repository | string | `"opea/codetrans"` | |
| service.port | string | `"7777"` | |
| tgi.LLM_MODEL_ID | string | `"mistralai/Mistral-7B-Instruct-v0.3"` | Models id from https://huggingface.co/, or predownloaded model directory |
| global.monitoring | bool | `false` | Enable usage metrics for the service components. See ../monitoring.md before enabling! |
| Key | Type | Default | Description |
| ----------------- | ------ | ---------------------------------- | -------------------------------------------------------------------------------------- |
| image.repository | string | `"opea/codetrans"` | |
| service.port | string | `"7777"` | |
| tgi.LLM_MODEL_ID | string | `"Qwen/Qwen2.5-Coder-7B-Instruct"` | Models id from https://huggingface.co/, or predownloaded model directory |
| global.monitoring | bool | `false` | Enable usage metrics for the service components. See ../monitoring.md before enabling! |
6 changes: 3 additions & 3 deletions helm-charts/codetrans/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,15 +60,15 @@ affinity: {}
# To override values in subchart tgi
tgi:
enabled: false
LLM_MODEL_ID: mistralai/Mistral-7B-Instruct-v0.3
LLM_MODEL_ID: Qwen/Qwen2.5-Coder-7B-Instruct

vllm:
enabled: true
LLM_MODEL_ID: mistralai/Mistral-7B-Instruct-v0.3
LLM_MODEL_ID: Qwen/Qwen2.5-Coder-7B-Instruct

llm-uservice:
TEXTGEN_BACKEND: vLLM
LLM_MODEL_ID: mistralai/Mistral-7B-Instruct-v0.3
LLM_MODEL_ID: Qwen/Qwen2.5-Coder-7B-Instruct

nginx:
service:
Expand Down
3 changes: 3 additions & 0 deletions helm-charts/common/vllm/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ data:
{{- if .Values.VLLM_CPU_KVCACHE_SPACE }}
VLLM_CPU_KVCACHE_SPACE: {{ .Values.VLLM_CPU_KVCACHE_SPACE | quote}}
{{- end }}
{{- if .Values.VLLM_CPU_OMP_THREADS_BIND }}
VLLM_CPU_OMP_THREADS_BIND: {{ .Values.VLLM_CPU_OMP_THREADS_BIND | quote}}
{{- end }}
{{- if .Values.VLLM_SKIP_WARMUP }}
VLLM_SKIP_WARMUP: {{ .Values.VLLM_SKIP_WARMUP | quote }}
{{- end }}
Expand Down
3 changes: 2 additions & 1 deletion helm-charts/common/vllm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ podSecurityContext: {}
# Workaround for https://github.com/opea-project/GenAIComps/issues/1549
# Need to run as root until upstream fixed and released.
securityContext:
readOnlyRootFilesystem: true
readOnlyRootFilesystem: false
Copy link
Collaborator

@ftian1 ftian1 Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change mandatory?

Copy link
Collaborator Author

@chensuyue chensuyue Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix this issue,
[pod/chatqna-1755584254-vllm-7f44887799-jjtsj/model-downloader] chmod: /data/models--meta-llama--Meta-Llama-3-8B-Instruct: Operation not permitted.
Without this update the test not able to execute chmod for the data path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is an issue now ? It was not before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix this issue, [pod/chatqna-1755584254-vllm-7f44887799-jjtsj/model-downloader] chmod: /data/models--meta-llama--Meta-Llama-3-8B-Instruct: Operation not permitted. Without this update the test not able to execute chmod for the data path.

That's clearly a wrong thing to do. This is the security context for the vLLM container itself, and that should not be modifying anything model related. All model related updates are done by the downloader init container:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L33

And that initContainer has hard-coded security context (not one coming from values file).

Additionally, models are on a separate volume from the root file system, and init container has the necessary capabilities to chmod etc the model files there, in case extra (vLLM) writes may be necessary with some of the models.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential reasons why things might fail now:

  • vLLM container is configured with a difference user/group => user/group should be updated
  • vLLM is configured to read model data from a different path => path should be fixed
  • vLLM needs to download additional files => initcontainer downloader should be asked to download those too, or if this is due to too old downloader, then the HF downloader image should be updated
  • vLLM writes now extra files to some other path => re-direct that to suitable path, or mount something appropriate there

Copy link
Collaborator Author

@chensuyue chensuyue Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several tests block in this line

chmod -R g+w /data/models--{{ replace "/" "--" .Values.LLM_MODEL_ID }};
, if the code change here not correct, need to find a proper why to make this work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will see only 2 of the chatqna vllm related test failed in this issue, I don't know why.

And if you search in the helm charts files, there are 30+ readOnlyRootFilesystem: false setting, please also check if it reasonable.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if you search in the helm charts files, there are 30+ readOnlyRootFilesystem: false setting, please also check if it reasonable.

Ouch. That's a clear regression from when they were last fixed by Lianhao, see: #815 (comment)

Copy link
Collaborator

@eero-t eero-t Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What permission should the model path be? I give all the models chmod -R 777, but still got this issue. What should be the user/group? Should it be the user deploy the test or root?

Looking at the error log: https://github.com/opea-project/GenAIExamples/actions/runs/17060819842/job/48367160723#step:6:381

Error is for the initcontainer. It can download data, but cannot change access rights for the downloaded data:
[pod/chatqna-1755584254-vllm-7f44887799-jjtsj/model-downloader] chmod: /data/models--meta-llama--Meta-Llama-3-8B-Instruct: Operation not permitted

With the chmod -R g+w /data/models--$LLM_MODEL_ID command in: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L60

Although (hard-coded) initContainer securityContext should have all the necessary capabilities to do that: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L38

as it has been working earlier...

InitContainer's /data path is at root of model-volume volume: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L108

Which according to the error log is in:

   model-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /data2/hf_model
    HostPathType:  Directory

=> @chensuyue please provide output lf ls -la /data2/hf_model for all the Gaudi hosts where CI could currently run these pods.

(Do those host directory access rights differ from what was used on CI Gaudi hosts earlier?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> @chensuyue please provide output lf ls -la /data2/hf_model for all the Gaudi hosts where CI could currently run these pods.

(Do those host directory access rights differ from what was used on CI Gaudi hosts earlier?)

I have given the current data folder the most lenient permissions. I didn't apply any special setting for those data path earlier beside apply chmod 777, maybe cloud team did.
image
image
image

allowPrivilegeEscalation: false
runAsNonRoot: false
runAsUser: 0
Expand Down Expand Up @@ -107,6 +107,7 @@ LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct
OMPI_MCA_btl_vader_single_copy_mechanism: ""
PT_HPU_ENABLE_LAZY_COLLECTIVES: ""
VLLM_CPU_KVCACHE_SPACE: ""
VLLM_CPU_OMP_THREADS_BIND: ""

global:
http_proxy: ""
Expand Down
Loading