Skip to content

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

License

Notifications You must be signed in to change notification settings

JunAr7112/gpu-operator

This branch is 3 commits ahead of, 631 commits behind NVIDIA/gpu-operator:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

651ed3a · Jul 3, 2024
Jun 14, 2024
Jul 1, 2024
Jun 25, 2024
Dec 12, 2023
Jun 27, 2024
May 3, 2024
Oct 9, 2023
Jul 3, 2024
Jun 27, 2024
Jul 1, 2024
Jun 14, 2024
Apr 26, 2024
Jun 27, 2024
Jul 3, 2024
Feb 11, 2024
Jun 12, 2024
May 3, 2024
Jun 28, 2024
Jun 27, 2024
Jun 6, 2024
Mar 22, 2021
Aug 14, 2023
Oct 4, 2023
Sep 30, 2023
Jan 2, 2024
Jun 20, 2024
Dec 2, 2023
Feb 16, 2023
Feb 27, 2019
Jun 14, 2024
Aug 20, 2023
Apr 20, 2022
Jun 6, 2024
Jun 27, 2024
Jun 27, 2024
Jul 3, 2024
Jun 28, 2024
Feb 24, 2022
Jun 14, 2024

Repository files navigation

license pipeline status coverage report

NVIDIA GPU Operator

nvidia-gpu-operator

Kubernetes provides access to special hardware resources such as NVIDIA GPUs, NICs, Infiniband adapters and other devices through the device plugin framework. However, configuring and managing nodes with these hardware resources requires configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors. The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others.

Audience and Use-Cases

The GPU Operator allows administrators of Kubernetes clusters to manage GPU nodes just like CPU nodes in the cluster. Instead of provisioning a special OS image for GPU nodes, administrators can rely on a standard OS image for both CPU and GPU nodes and then rely on the GPU Operator to provision the required software components for GPUs.

Note that the GPU Operator is specifically useful for scenarios where the Kubernetes cluster needs to scale quickly - for example provisioning additional GPU nodes on the cloud or on-prem and managing the lifecycle of the underlying software components. Since the GPU Operator runs everything as containers including NVIDIA drivers, the administrators can easily swap various components - simply by starting or stopping containers.

Product Documentation

For information on platform support and getting started, visit the official documentation repository.

Webinar

How to easily use GPUs on Kubernetes

Contributions

Read the document on contributions. You can contribute by opening a pull request.

Support and Getting Help

Please open an issue on the GitHub project for any questions. Your feedback is appreciated.

About

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages

  • Go 90.1%
  • Shell 6.0%
  • Makefile 2.3%
  • Other 1.6%