-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add resource.sharing-strategy labels #503
Conversation
deployments/helm/nvidia-device-plugin/templates/daemonset-mps-control-daemon.yml
Outdated
Show resolved
Hide resolved
deployments/helm/nvidia-device-plugin/templates/daemonset-device-plugin.yml
Outdated
Show resolved
Hide resolved
deployments/helm/nvidia-device-plugin/templates/daemonset-device-plugin.yml
Outdated
Show resolved
Hide resolved
"nvidia.com/gpu.count": "1", | ||
"nvidia.com/gpu.replicas": "2", | ||
"nvidia.com/gpu.sharing-strategy": "mps", | ||
"nvidia.com/gpu.memory": "300", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I just had was whether sharing using MPS should affect the memory label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe have two separate labels, one representing total memory and another representing memory per replica?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about a separate label. What would this label signify when time-slicing
is selected, for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5c00efc
to
51dbc89
Compare
"nvidia.com/mig.capable": "[true|false]", | ||
"nvidia.com/gpu.compute.major": "[0-9]+", | ||
"nvidia.com/gpu.compute.minor": "[0-9]+", | ||
"nvidia.com/sharing.mps.enabled": "[true|false]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question -- what component is adding this label to the node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see this was added to GFD in a previous commit: https://github.com/NVIDIA/k8s-device-plugin/blob/main/internal/lm/nvml.go#L149
Question -- do we still need nvidia.com/sharing.mps.enabled
? AFAIK we use this as the nodeSelector for the MPS control daemon. Is it sufficient to just use nvidia.com/gpu.sharing-strategy=mps
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the sharing-strategy
label is per-resource. This means that if we enable resource renaming or start allowing MPS with mixed mig mode, the label to trigger off is not known.
I would say that nvidia.com/sharing.mps.enabled
is akn to nvidia.com/mig.capable
in this respect.
@klueska I know that you expressed some concerns over the exact label. Do you have any other suggestions for the label? Should we just make it nvidia.com/mps.capable
to mirror mig and to indicate that the final decision depends on whether a number of replicas greater than 1 is actually selected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mig.capable
is different than nvidia.com/sharing.mps.enabled
in that it says that the GPUs on a node are capable of being split into MIG devices -- it doesn't imply that they are in MIG mode or that there are any MIG devices currently configured.
However it is similar in that it is primarily used to signal that the MIG manager should be deployed on the node (whereas we want something to signal that an MPS daemon should be started for this resource type).
I think dropping nvidia.com/sharing.mps.enabled
and adding a nvidia.com/mps.capable
seems reasonable.
Just to make sure we are on the same page -- this would get set if there is a sharing.mps
section in the config, and that would trigger the MPS daemonset to be deployed (similar to MIG manager) and then start one MPS control daemon per resources type that has replicas > 1. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To start: It could also be nvidia.com/mps.enabled
instead of nvidia.com/mps.capable
.
We currently trigger the nvidia.com/sharing.mps.enabled
label if the sharing.mps
section is present and at lease one resource pattern has a replica > 1. This controls the deployment of the MPS daemonset.
In this daemonset the pattern is evaluated against actual resources and if any of these have replicas > 1 then a nvidia-cuda-mps-control
daemon is started for this resource. A further note is that we currently only support a single resource name and as such only start a single daemon. The plan is to extend this once we have more user feedback / experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should either make it nvidia.com/mps.capable
to mimic what we have for MIG, or just stick with nvidia.com/sharing.mps.enabled
. I think that having nvidia.com/mig.capable
, but nvidia.com/mps.enabled
would be confusing, especially since MPS can be layered on top of MIG.
Speaking of, are there any plans for an equivalent label for timeSlicing
? I know there is no daemon that needs to be started in this respect, but does it make sense to add for symmetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point on using nvidia.com/mps.capable
.
There are no current plans for timeSlicing
, but we can add it. There are, for example, some devices that don't support time slicing and adding this label may make sense there. (see NVIDIA/k8s-dra-driver#58).
I can add an issue to create this label as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #533 to do the rename.
51dbc89
to
c883a59
Compare
internal/lm/resource.go
Outdated
rawLabels := map[string]interface{}{ | ||
"product": rl.getProductName(parts...), | ||
"count": count, | ||
"replicas": rl.getReplicas(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"replicas": rl.getReplicas(), | |
"replicas": replicas, |
This change adds sharing-strategy labels per resource. This label can have the value: none, mps, time-slicing depending on the sharing configuration. For invalid configurations, this label is empty. Signed-off-by: Evan Lezar <[email protected]>
c883a59
to
40e6089
Compare
This change adds sharing-strategy labels per resource.
This label can have the value: none, mps, time-slicing depending on the sharing configuration. For invalid configurations, this label is empty.
Running against the in-tree kind cluster:
Deploy using the config:
Get the labels:
Deploy:
Get the labels: