Skip to content

Conversation

@vrantala
Copy link
Contributor

@vrantala vrantala commented Jul 4, 2025

Description

Added mistral-7b-instruct-v0.3 and mixtral-8x7b-instruct-v0.1 model yaml files. Updated model parameters more optimal based on benchmark-serving testing with default parameters: request-rate=800, Input/Output tokens=200/200

Issues

n/a.

Type of change

  • [ x] New feature (non-breaking change which adds new functionality)

Dependencies

n/a.

Tests

Run KubeAI's benchmarking-serving tests to all models.

@vrantala vrantala requested review from mkbhanda and poussa as code owners July 4, 2025 07:11
* Added mistral and mistral model yaml files
* Updated model parameters more optimal based on
  benchmark-serving testing with default parameters:
  request-rate=800, Input/Output tokens=200/200

Signed-off-by: Rantala <[email protected]>
@poussa
Copy link
Member

poussa commented Jul 4, 2025

Also, add short README.md to the models directory stating that vLLM benchmarking serving script was used to find the parameters for the models, with the arguments request-rate=800, Input/Output tokens=200/200, concurrency = xxx.

@vrantala
Copy link
Contributor Author

vrantala commented Jul 4, 2025

Also, add short README.md to the models directory stating that vLLM benchmarking serving script was used to find the parameters for the models, with the arguments request-rate=800, Input/Output tokens=200/200, concurrency = xxx.

Added README.md file

@vrantala vrantala requested review from eero-t and poussa July 4, 2025 08:46
Copy link
Collaborator

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor changes still needed.

@eero-t
Copy link
Collaborator

eero-t commented Jul 4, 2025

CI also seems to require adding README to some "toctree":
kubeai/models/README.md: WARNING: document isn't included in any toctree

@poussa Any idea what / where that is?

@poussa
Copy link
Member

poussa commented Jul 4, 2025

CI also seems to require adding README to some "toctree": kubeai/models/README.md: WARNING: document isn't included in any toctree

When you add a README file, you need to add a reference to it from some other README. In this case, just add link to the kubeai/README.md, for example:

The following [models](models/README.md) are included.

@vrantala
Copy link
Contributor Author

vrantala commented Jul 4, 2025

CI also seems to require adding README to some "toctree": kubeai/models/README.md: WARNING: document isn't included in any toctree

When you add a README file, you need to add a reference to it from some other README. In this case, just add link to the kubeai/README.md, for example:

The following [models](models/README.md) are included.

Added link to kubeai/models/README.md

Copy link
Collaborator

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

Main README update did not fix the toctree CI check though, so apparently the README needs to be linked also somewhere else?

@eero-t
Copy link
Collaborator

eero-t commented Jul 4, 2025

I think this can be merged despite toctree error. CI test should tell where the link should be added...

@poussa poussa merged commit 4e70d8b into opea-project:main Jul 4, 2025
6 of 7 checks passed
maxReplicas: 8
# Equals to max-num-seqs (batch-size)
targetRequests: 512
resourceProfile: gaudi-for-text-generation::1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed the double double-colon typo...

@chensuyue
Copy link
Collaborator

This PR merged without CI pass, that will block IO build CI in all the other PRs, https://github.com/opea-project/GenAIExamples/actions/runs/16105452982/job/45440218033?pr=1922
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants