Enhance documentation for the repo, now with GFD and nfd as sub chart

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
NVIDIA · Apr 16, 2024 · e53056c · e53056c
1 parent 787ab06
commit e53056c
Show file tree

Hide file tree

Showing 7 changed files with 1,071 additions and 36 deletions.
diff --git a/README.md b/README.md
@@ -48,7 +48,6 @@ Please note that:
 - The NVIDIA device plugin is currently lacking
     - Comprehensive GPU health checking features
     - GPU cleanup features
-    - ...
 - Support will only be provided for the official NVIDIA device plugin (and not
   for forks or other variants of this plugin).
 
@@ -1016,38 +1015,29 @@ See the [changelog](CHANGELOG.md)
 * You can report a bug by [filing a new issue](https://github.com/NVIDIA/k8s-device-plugin/issues/new)
 * You can contribute by opening a [pull request](https://help.github.com/articles/using-pull-requests/)
 
-### Versioning
-
-Before v1.10 the versioning scheme of the device plugin had to match exactly the version of Kubernetes.
-After the promotion of device plugins to beta this condition was was no longer required.
-We quickly noticed that this versioning scheme was very confusing for users as they still expected to see
-a version of the device plugin for each version of Kubernetes.
-
-This versioning scheme applies to the tags `v1.8`, `v1.9`, `v1.10`, `v1.11`, `v1.12`.
-
-We have now changed the versioning to follow [SEMVER](https://semver.org/). The
-first version following this scheme has been tagged `v0.0.0`.
-
-Going forward, the major version of the device plugin will only change
-following a change in the device plugin API itself. For example, version
-`v1beta1` of the device plugin API corresponds to version `v0.x.x` of the
-device plugin. If a new `v2beta2` version of the device plugin API comes out,
-then the device plugin will increase its major version to `1.x.x`.
-
-As of now, the device plugin API for Kubernetes >= v1.10 is `v1beta1`.  If you
-have a version of Kubernetes >= 1.10 you can deploy any device plugin version >
-`v0.0.0`.
-
-### Upgrading Kubernetes with the Device Plugin
-
-Upgrading Kubernetes when you have a device plugin deployed doesn't require you
-to do any, particular changes to your workflow.  The API is versioned and is
-pretty stable (though it is not guaranteed to be non breaking). Starting with
-Kubernetes version 1.10, you can use `v0.3.0` of the device plugin to perform
-upgrades, and Kubernetes won't require you to deploy a different version of the
-device plugin. Once a node comes back online after the upgrade, you will see
-GPUs re-registering themselves automatically.
-
-Upgrading the device plugin itself is a more complex task. It is recommended to
-drain GPU tasks as we cannot guarantee that GPU tasks will survive a rolling
-upgrade. However we make best efforts to preserve GPU tasks during an upgrade.
+## Documentation
+
+- [Quick Start](docs/quick_start.md)
+  * [Prerequisites](docs/quick_start.md#prerequisites)
+  * [Preparing your GPU Nodes](docs/quick_start.md#preparing-your-gpu-nodes)
+  * [Node Feature Discovery (NFD)](docs/quick_start.md#node-feature-discovery-nfd)
+  * [Enabling GPU Support in Kubernetes](docs/quick_start.md#enabling-gpu-support-in-kubernetes)
+  * [Running GPU Jobs](docs/quick_start.md#running-gpu-jobs)
+- [Configuring the NVIDIA device plugin binary](docs/customizing.md)
+  * [As command line flags or envvars](docs/customizing.md#as-command-line-flags-or-envvars)
+  * [As a configuration file](docs/customizing.md#as-a-configuration-file)
+  * [Configuration Option Details](docs/customizing.md#configuration-option-details)
+  * [Shared Access to GPUs with CUDA Time-Slicing](docs/customizing.md#shared-access-to-gpus-with-cuda-time-slicing)
+- [Deployment via `helm`](docs/deployment_via_helm.md)
+  * [Configuring the device plugin's `helm` chart](docs/deployment_via_helm.md#configuring-the-device-plugins-helm-chart)
+    + [Passing configuration to the plugin via a `ConfigMap`.](docs/deployment_via_helm.md#passing-configuration-to-the-plugin-via-a-configmap)
+      - [Single Config File Example](docs/deployment_via_helm.md#single-config-file-example)
+      - [Multiple Config File Example](docs/deployment_via_helm.md#multiple-config-file-example)
+      - [Updating Per-Node Configuration With a Node Label](docs/deployment_via_helm.md#updating-per-node-configuration-with-a-node-label)
+    + [Setting other helm chart values](docs/deployment_via_helm.md#setting-other-helm-chart-values)
+    + [Deploying with gpu-feature-discovery for automatic node labels](docs/deployment_via_helm.md#deploying-with-gpu-feature-discovery-for-automatic-node-labels)
+  * [Deploying via `helm install` with a direct URL to the `helm` package](docs/deployment_via_helm.md#deploying-via-helm-install-with-a-direct-url-to-the-helm-package)
+- [Building and Running Locally](docs/building_and_running.md)
+- [GPU Feature Discovery CMD](docs/gfd_cmd.md)
+- [GPU Feature Discovery Labels](docs/gfd_labels.md)
+- [Changelog](CHANGELOG.md)
diff --git a/docs/building_and_running.md b/docs/building_and_running.md
@@ -0,0 +1,79 @@
+## Building and Running Locally
+
+The next sections are focused on building the device plugin locally and running it.
+It is intended purely for development and testing, and not required by most users.
+It assumes you are pinning to the latest release tag (i.e. `v0.14.0`), but can
+easily be modified to work with any available tag or branch.
+
+### With Docker
+
+#### Build
+Option 1, pull the prebuilt image from [Docker Hub](https://hub.docker.com/r/nvidia/k8s-device-plugin):
+
+```shell
+$ docker pull nvcr.io/nvidia/k8s-device-plugin:v0.14.0
+$ docker tag nvcr.io/nvidia/k8s-device-plugin:v0.14.0 nvcr.io/nvidia/k8s-device-plugin:devel
+```
+
+Option 2, build without cloning the repository:
+
+```shell
+$ docker build \
+    -t nvcr.io/nvidia/k8s-device-plugin:devel \
+    -f deployments/container/Dockerfile.ubuntu \
+    https://github.com/NVIDIA/k8s-device-plugin.git#v0.14.0
+```
+
+Option 3, if you want to modify the code:
+
+```shell
+$ git clone https://github.com/NVIDIA/k8s-device-plugin.git && cd k8s-device-plugin
+$ make -f deployments/container/Makefile build-ubuntu20.04
+```
+
+#### Run
+Without compatibility for the `CPUManager` static policy:
+
+```shell
+$ docker run \
+    -it \
+    --security-opt=no-new-privileges \
+    --cap-drop=ALL \
+    --network=none \
+    -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins \
+    nvcr.io/nvidia/k8s-device-plugin:devel
+```
+
+With compatibility for the `CPUManager` static policy:
+
+```shell
+$ docker run \
+    -it \
+    --privileged \
+    --network=none \
+    -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins \
+    nvcr.io/nvidia/k8s-device-plugin:devel --pass-device-specs
+```
+
+### Without Docker
+
+#### Build
+
+
+```shell
+$ make cmds 
+```
+
+#### Run
+Without compatibility for the `CPUManager` static policy:
+
+```shell
+$ ./gpu-feature-discovery --output=$(pwd)/gfd
+$ ./k8s-device-plugin
+```
+
+With compatibility for the `CPUManager` static policy:
+
+```shell
+$ ./k8s-device-plugin --pass-device-specs
+```