Is there a better way than system extensions to run simple commands on boot as root? #9857

asymingt · 2024-12-02T06:03:37Z

asymingt
Dec 2, 2024

I have a single NVidia GPU on a target Talos v1.8.3 node that I'd like to always move to EXCLUSIVE_PROCESS compute mode on reboot, because this is a requirement for the Kubernetes nvidia-device-plugin MPS sharing feature. In a nutshell, this requires that I run the following commands as root which unfortunately do not persist across a node reboot.

nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
nvidia-cuda-mps-control -d

I'd typically add the commands above as a shell script in /etc/init.d but from what I can tell there is no equivalent in Talos because it uses machined which is limited to initializing kubernetes only. Another discussion suggests that I should make this a system extension. I tried my hand at hacking one together despite never having written in Go. I then wrote a patch:

machine:
  install:
      extensions:
        - image: asymingt/nvidia-compute-mode-service:v1.0.1

Unfortunately, when I installed the patch it seemed to work but the extension never showed up (also, as you can see below, this process is deprecated and so I don't know how wise it is to depend on it):

$ talosctl -n 192.168.4.154 -e 192.168.4.154 --talosconfig=./talosconfig patch mc --patch @my-patch.yaml -m reboot
patched MachineConfigs.config.talos.dev/v1alpha1 at the node 192.168.4.154
WARNING: .machine.install.extensions is deprecated, please see https://www.talos.dev/latest/talos-guides/install/boot-assets/
Applied configuration with a reboot.
$ talosctl -n 192.168.4.154 -e 192.168.4.154 --talosconfig=./talosconfig get extensions
NODE            NAMESPACE   TYPE              ID            VERSION   NAME                                        VERSION
192.168.4.154   runtime     ExtensionStatus   0             1         amd-ucode                                   20241110
192.168.4.154   runtime     ExtensionStatus   1             1         amdgpu-firmware                             20241110
192.168.4.154   runtime     ExtensionStatus   2             1         i915-ucode                                  20241110
192.168.4.154   runtime     ExtensionStatus   3             1         nvidia-container-toolkit-production         550.90.07-v1.16.1
192.168.4.154   runtime     ExtensionStatus   4             1         nvidia-open-gpu-kernel-modules-production   550.90.07-v1.8.3
192.168.4.154   runtime     ExtensionStatus   5             1         schematic                                   3efeb200f226e383f39b24073904fb1f776649189a791df2d54b9c321c3343c9
192.168.4.154   runtime     ExtensionStatus   modules.dep   1         modules.dep                                 6.6.60-talos

My gut says that something as trivial as a simple command on boot should not require this level of complexity. Perhaps I am wrong, though. Any suggestions would be very helpful. My major concern at this point is that I'm going to have to write a different extension for each node in my cluster depending on how many GPUs are attached and how I want them configured. It would be much easier to just have sequence of init commands in a machine configuration or patch.

Instructions for repeatability

After following the Talos NVidia install instructions, it is possible to get MPS working by enabling privileged execution of a container, and then running the two required commands above inside the container as follows:

kubectl label --overwrite ns default  pod-security.kubernetes.io/enforce=privileged
kubectl run   nvidia-test   --restart=Never   -ti --rm   \
    --image nvcr.io/nvidia/cuda:12.4.1-runtime-ubuntu20.04   \
    --overrides '{"spec": {"runtimeClassName": "nvidia"}}' --privileged bash
> nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
> nvidia-cuda-mps-control -d

Then, assuming that you have installed the nvidia-device-plugin with two configurations. This allows me to label a node with rtx4060 and have the device plugin configure MPS automatically:

helm upgrade -i nvdp nvdp/nvidia-device-plugin \
    --namespace kube-system \
    --set gfd.enabled=true \
    --set runtimeClassName=nvidia \
    --set config.default=default \
    --set-file config.map.default=default.yaml \
    --set-file config.map.rtx4060=rtx4060.yaml

With following YAML configurations:

# rtx4060.yaml
version: v1
sharing:
  mps:
    resources:
    - name: nvidia.com/gpu
      replicas: 4

# default.yaml
version: v1

And you label your target node to mark it as having a single GPU that we want to split with MPS, and make sure of course that the NoSchedule taint is removed if this is a control plane node.

kubectl label node target-node nvidia.com/gpu.present=true --overwrite
kubectl label node target-node nvidia.com/device-plugin.config=rtx4060 --overwrite

After a bit of time you will see the single 8GB GPU split into four 2GB GPUs:

kubectl describe node target-node
...
Capacity:
  cpu:                192
  ephemeral-storage:  1951051424Ki
  hugepages-2Mi:      0
  memory:             528144832Ki
  nvidia.com/gpu:     4 <---- Yay!
  pods:               110
Allocatable:
  cpu:                191950m
  ephemeral-storage:  1797820553926
  hugepages-2Mi:      0
  memory:             527518144Ki
  nvidia.com/gpu:     4 <---- Yay!
  pods:               110

Answered by smira

Dec 2, 2024

You can use a Kubernetes DaemonSet as a way to run something on boot. Just run the command, and sleep forever in the shell script. This way on node reboot the contianer/pod will be recreated, and it will run the command once again.

Longer term, it might be nice to make it part of the nvidia extension.

View full answer

smira · 2024-12-02T10:02:29Z

smira
Dec 2, 2024
Maintainer

You can use a Kubernetes DaemonSet as a way to run something on boot. Just run the command, and sleep forever in the shell script. This way on node reboot the contianer/pod will be recreated, and it will run the command once again.

Longer term, it might be nice to make it part of the nvidia extension.

1 reply

asymingt Dec 2, 2024
Author

OK, so reading between the lines, I need to start thinking about these sorts of actions at the Kubernetes rather than at the OS / Talos level. I guess that makes sense when you are targeting a minimal, immutable, OS. I apologize if this question was basic; I'm fairly new to this way of thinking 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a better way than system extensions to run simple commands on boot as root? #9857

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is there a better way than system extensions to run simple commands on boot as root? #9857

asymingt Dec 2, 2024

Instructions for repeatability

Replies: 1 comment · 1 reply

smira Dec 2, 2024 Maintainer

asymingt Dec 2, 2024 Author

asymingt
Dec 2, 2024

Replies: 1 comment 1 reply

smira
Dec 2, 2024
Maintainer

asymingt Dec 2, 2024
Author