-
Notifications
You must be signed in to change notification settings - Fork 98
[KubeAI][Models] Added 2 new model files and updated model parameters #1150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Added mistral and mistral model yaml files * Updated model parameters more optimal based on benchmark-serving testing with default parameters: request-rate=800, Input/Output tokens=200/200 Signed-off-by: Rantala <[email protected]>
|
Also, add short README.md to the models directory stating that vLLM benchmarking serving script was used to find the parameters for the models, with the arguments request-rate=800, Input/Output tokens=200/200, concurrency = xxx. |
Signed-off-by: Rantala <[email protected]>
Added README.md file |
eero-t
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor changes still needed.
|
CI also seems to require adding README to some "toctree": @poussa Any idea what / where that is? |
When you add a README file, you need to add a reference to it from some other README. In this case, just add link to the kubeai/README.md, for example:
|
Signed-off-by: Rantala <[email protected]>
for more information, see https://pre-commit.ci
Added link to kubeai/models/README.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
Main README update did not fix the toctree CI check though, so apparently the README needs to be linked also somewhere else?
|
I think this can be merged despite |
| maxReplicas: 8 | ||
| # Equals to max-num-seqs (batch-size) | ||
| targetRequests: 512 | ||
| resourceProfile: gaudi-for-text-generation::1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed the double double-colon typo...
|
This PR merged without CI pass, that will block IO build CI in all the other PRs, https://github.com/opea-project/GenAIExamples/actions/runs/16105452982/job/45440218033?pr=1922 |

Description
Added mistral-7b-instruct-v0.3 and mixtral-8x7b-instruct-v0.1 model yaml files. Updated model parameters more optimal based on benchmark-serving testing with default parameters: request-rate=800, Input/Output tokens=200/200
Issues
n/a.Type of change
Dependencies
n/a.Tests
Run KubeAI's benchmarking-serving tests to all models.