Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add nvidia MIG Settings #63

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

piyush-jena
Copy link
Contributor

@piyush-jena piyush-jena commented Nov 5, 2024

Issue #, if available:

Description of changes:

  • Settings sdk changes for supporting settings.kubelet-device-plugins.nvidia.device-partitioning-strategy and settings.kubelet-device-plugins.nvidia.mig.profile-a100, settings.kubelet-device-plugins.nvidia.mig.profile-h100, settings.kubelet-device-plugins.nvidia.mig.profile-h200 in bottlerocket.

Testing:

  1. Model Default:
bash-5.1# apiclient get settings.kubelet-device-plugin
{
  "settings": {
    "kubelet-device-plugins": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "volume-mounts",
        "device-partitioning-strategy": "none",
        "device-sharing-strategy": "none",
        "pass-device-specs": true,
      }
    }
  }
}
  1. Model Updates:
bash-5.1# apiclient set settings.kubelet-device-plugins.nvidia.device-partitioning-strategy="mig"
bash-5.1# settings.kubelet-device-plugins.nvidia.mig.profile-a100="1g.5gb"
bash-5.1# apiclient get settings.kubelet-device-plugin
{
  "settings": {
    "kubelet-device-plugins": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "volume-mounts",
        "device-partitioning-strategy": "mig",
        "device-sharing-strategy": "none",
        "mig": {
          "profile-a100": "1g.5gb"
        },
        "pass-device-specs": true
      }
    }
  }
}
  1. Check:
bash-5.1# apiclient set settings.kubelet-device-plugins.nvidia.mig.profile-a100="1g.7gb"
Failed to change settings: Failed PATCH request to '/settings/keypair?tx=apiclient-set-IOIe8VsE9pO9Jmkl': Status 400 when PATCHing /settings/keypair?tx=apiclient-set-IOIe8VsE9pO9Jmkl: Unable to match your input to the data model.  We may not have enough type information.  Please try the --json input form.  Cause: Error during deserialization: Unable to deserialize into MIGA100Profile: Invalid MIG Profile value '1g.7gb' at line 1 column 68
  1. Files generated:
bash-5.1# cat /etc/nvidia-migmanager/nvidia-migmanager.toml
device-partitioning-strategy = "mig"
profile-a100 = "1g.5gb"

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant