Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Model multiple resource profiles #428

Open
samos123 opened this issue Feb 28, 2025 · 0 comments
Open

Single Model multiple resource profiles #428

samos123 opened this issue Feb 28, 2025 · 0 comments

Comments

@samos123
Copy link
Contributor

It should be possible to talk to a single model that is backed by different GPUs. In addition, users may want to use spot, on-demand and reserved capacity that may have to be expressed by different resource profiles.

One idea was to have a higher level Custom Resource: ModelAlias that serves as the single endpoint.

Another idea from @liaddrori1:
Provide flexibility in deploying LLMs across various GPU configurations, such as 1x L4, 2x L4, or 1x A100. This would facilitate efficient resource utilization and scalability.

However, instead of introducing an additional CRD for ModelAlias, you might want to consider extending the existing resourceProfile field to accept multiple configurations. By allowing resourceProfile to be an array with assigned priorities or weights, the KubeAI controller could attempt to schedule models based on the specified preferences. This approach streamlines the configuration and leverages the current architecture.

Proposed Configuration Example:

resourceProfile: 
  h100-1gpu: 
    priority: 1
  l4-2gpu:
    priority: 2
    args: # args override to use 2 GPUs
    [
      "--tensor-parallel-size",
      "2"
    ]
  l4-1gpu: 
    priority: 3

In this setup, the controller would prioritize scheduling on h100-1gpu. If resources are unavailable, it would attempt l4-2gpu, and subsequently l4-1gpu, ensuring optimal resource allocation without requiring a new CRD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant