-
Notifications
You must be signed in to change notification settings - Fork 98
Align helm charts values with compose yaml & release bug fix #1189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
aacdee7
Align helm charts values with compose yaml
chensuyue b25276c
Update CodeTrans default model align with docker compose
chensuyue 33abaae
Fix the model-downloader pods Operation not permitted issue
chensuyue 9c340f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,7 @@ tgi: | |
| enabled: false | ||
| vllm: | ||
| enabled: true | ||
| VLLM_CPU_OMP_THREADS_BIND: all | ||
|
|
||
| speecht5: | ||
| enabled: false | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,7 @@ tgi: | |
| enabled: false | ||
| vllm: | ||
| enabled: true | ||
| VLLM_CPU_OMP_THREADS_BIND: all | ||
|
|
||
| speecht5: | ||
| enabled: true | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change mandatory?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix this issue,
[pod/chatqna-1755584254-vllm-7f44887799-jjtsj/model-downloader] chmod: /data/models--meta-llama--Meta-Llama-3-8B-Instruct: Operation not permitted.Without this update the test not able to execute chmod for the data path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is an issue now ? It was not before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's clearly a wrong thing to do. This is the security context for the vLLM container itself, and that should not be modifying anything model related. All model related updates are done by the downloader init container:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L33
And that initContainer has hard-coded security context (not one coming from values file).
Additionally, models are on a separate volume from the root file system, and init container has the necessary capabilities to chmod etc the model files there, in case extra (vLLM) writes may be necessary with some of the models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential reasons why things might fail now:
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several tests block in this line
GenAIInfra/helm-charts/common/vllm/templates/deployment.yaml
Line 60 in 9c340f4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will see only 2 of the chatqna vllm related test failed in this issue, I don't know why.
And if you search in the helm charts files, there are 30+
readOnlyRootFilesystem: falsesetting, please also check if it reasonable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch. That's a clear regression from when they were last fixed by Lianhao, see: #815 (comment)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the error log: https://github.com/opea-project/GenAIExamples/actions/runs/17060819842/job/48367160723#step:6:381
Error is for the initcontainer. It can download data, but cannot change access rights for the downloaded data:
[pod/chatqna-1755584254-vllm-7f44887799-jjtsj/model-downloader] chmod: /data/models--meta-llama--Meta-Llama-3-8B-Instruct: Operation not permittedWith the
chmod -R g+w /data/models--$LLM_MODEL_IDcommand in: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L60Although (hard-coded) initContainer
securityContextshould have all the necessary capabilities to do that: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L38as it has been working earlier...
InitContainer's
/datapath is at root ofmodel-volumevolume: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L108Which according to the error log is in:
=> @chensuyue please provide output lf
ls -la /data2/hf_modelfor all the Gaudi hosts where CI could currently run these pods.(Do those host directory access rights differ from what was used on CI Gaudi hosts earlier?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have given the current data folder the most lenient permissions. I didn't apply any special setting for those data path earlier beside apply



chmod 777, maybe cloud team did.