Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using nvidia runtime with buildkit #122

Open
rmax opened this issue Sep 23, 2021 · 16 comments
Open

Using nvidia runtime with buildkit #122

rmax opened this issue Sep 23, 2021 · 16 comments

Comments

@rmax
Copy link

rmax commented Sep 23, 2021

I'm trying to have nvidia driver available during build which works with the default build command but not when using buildkit.

I have this minimal Dockerfile

FROM nvidia/cuda:11.1-base
RUN ls /dev/nvidia*
RUN nvidia-smi                                                                                                                                                                                         

Which I can build as follow:

❯ docker build -f Dockerfile . --no-cache
Sending build context to Docker daemon  5.167kB
Step 1/3 : FROM nvidia/cuda:11.1-base
 ---> 287475453634
Step 2/3 : RUN ls /dev/nvidia*
 ---> Running in e8c12b8a398b
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia0
/dev/nvidiactl
Removing intermediate container e8c12b8a398b
 ---> 17884a1f0b6a
Step 3/3 : RUN nvidia-smi
 ---> Running in 4a5bbd2337c0
Thu Sep 23 14:08:59 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03   Driver Version: 450.119.03   CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Removing intermediate container 4a5bbd2337c0
 ---> 308dfa443901
Successfully built 308dfa443901

But when using buildkit I get

❯ DOCKER_BUILDKIT=1 docker build -f Dockerfile .
[+] Building 0.4s (5/6)
 => [internal] load build definition from Dockerfile                                                                                                                                     0.0s
 => => transferring dockerfile: 116B                                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.1-base                                                                                                                                       0.0s
 => CACHED [1/3] FROM docker.io/nvidia/cuda:11.1-base                                                                                                                                                  0.0s
 => ERROR [2/3] RUN ls /dev/nvidia*                                                                                                                                                                    0.3s
------
 > [2/3] RUN ls /dev/nvidia*:
NVIDIA/nvidia-container-runtime#5 0.299 ls: cannot access '/dev/nvidia*': No such file or directory
------
executor failed running [/bin/sh -c ls /dev/nvidia*]: exit code: 2

Then I figured out I have to use RUN --security=insecure and use docker buildx as follows

# Dockerfile.buildkit
# syntax = docker/dockerfile:experimental
FROM nvidia/cuda:11.1-base
RUN --security=insecure nvidia-smi

I create the builder

❯ docker buildx create --driver docker-container --name local --buildkitd-flags '--allow-insecure-entitlement security.insecure' --use
local

then I build the image as follows

❯ docker buildx build -f Dockerfile.buildkit . --allow security.insecure
WARN[0000] No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
[+] Building 0.7s (9/9) FINISHED
 => [internal] load build definition from Dockerfile.buildkit                                                                                                                                          0.0s
 => => transferring dockerfile: 150B                                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                        0.0s
 => resolve image config for docker.io/docker/dockerfile:experimental                                                                                                                                  0.1s
 => CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:600e5c62eedff338b3f7a0850beb7c05866e0ef27b2d2e8c02aa468e78496ff5                                                             0.0s
 => => resolve docker.io/docker/dockerfile:experimental@sha256:600e5c62eedff338b3f7a0850beb7c05866e0ef27b2d2e8c02aa468e78496ff5                                                                        0.0s
 => [internal] load .dockerignore                                                                                                                                                                      0.0s
 => [internal] load build definition from Dockerfile.buildkit                                                                                                                                          0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.1-base                                                                                                                                       0.1s
 => CACHED [1/2] FROM docker.io/nvidia/cuda:11.1-base@sha256:c6bb47a62ad020638aeaf66443de9c53c6dc8a0376e97b2d053ac774560bd0fa                                                                          0.0s
 => => resolve docker.io/nvidia/cuda:11.1-base@sha256:c6bb47a62ad020638aeaf66443de9c53c6dc8a0376e97b2d053ac774560bd0fa                                                                                 0.0s
 => ERROR [2/2] RUN --security=insecure nvidia-smi                                                                                                                                                     0.1s
------
 > [2/2] RUN --security=insecure nvidia-smi:
NVIDIA/nvidia-container-runtime#8 0.074 /bin/sh: 1: nvidia-smi: not found
------
Dockerfile.buildkit:3
--------------------
   1 |     # syntax = docker/dockerfile:experimental
   2 |     FROM nvidia/cuda:11.1-base
   3 | >>> RUN --security=insecure nvidia-smi
   4 |
--------------------
error: failed to solve: process "/bin/sh -c nvidia-smi" did not complete successfully: exit code: 127

I know the insecure flag works when I use another command that requires privilege (i.e.: mount --bind /dev /tmp)

This is my daemon.json

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

The output of docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.1-docker)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 10
  Running: 1
  Paused: 0
  Stopped: 9
 Images: 191
 Server Version: 20.10.8
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia
 Default Runtime: nvidia
 Init Binary: docker-init
 containerd version: e25210fe30a0a703442421b0f60afac609f950a3
 runc version: v1.0.1-0-g4144b63
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-1056-aws
 Operating System: Ubuntu 18.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 59.86GiB
 Name: ip-172-31-13-189
 ID: JSBF:DURT:RVBM:P7XL:YIWL:IKJU:3WIS:25N6:UH72:ALJA:XDRO:R35Q
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

The output of nvidia-container-cli -k -d /dev/tty info


-- WARNING, the following logs are for debugging purposes only --

I0923 14:17:11.705485 3265 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0923 14:17:11.705537 3265 nvc.c:346] using root /
I0923 14:17:11.705556 3265 nvc.c:347] using ldcache /etc/ld.so.cache
I0923 14:17:11.705571 3265 nvc.c:348] using unprivileged user 1000:1000
I0923 14:17:11.705600 3265 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0923 14:17:11.705791 3265 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W0923 14:17:11.711421 3266 nvc.c:269] failed to set inheritable capabilities
W0923 14:17:11.711475 3266 nvc.c:270] skipping kernel modules load due to failure
I0923 14:17:11.711714 3267 driver.c:101] starting driver service
I0923 14:17:11.715296 3265 nvc_info.c:676] requesting driver information with ''
I0923 14:17:11.717376 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.450.119.03
I0923 14:17:11.717652 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.450.119.03
I0923 14:17:11.717748 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.119.03
I0923 14:17:11.717819 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.119.03
I0923 14:17:11.717897 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.119.03
I0923 14:17:11.718008 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.450.119.03
I0923 14:17:11.718111 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.450.119.03
I0923 14:17:11.718180 3265 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-nscq-dcgm.so.450.51.06
I0923 14:17:11.718256 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.450.119.03
I0923 14:17:11.718322 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.450.119.03
I0923 14:17:11.718430 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.450.119.03
I0923 14:17:11.718548 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.119.03
I0923 14:17:11.718625 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.119.03
I0923 14:17:11.718713 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.119.03
I0923 14:17:11.718794 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.450.119.03
I0923 14:17:11.718902 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.450.119.03
I0923 14:17:11.719021 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.119.03
I0923 14:17:11.719108 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.450.119.03
I0923 14:17:11.719184 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.450.119.03
I0923 14:17:11.719293 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.450.119.03
I0923 14:17:11.719381 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.450.119.03
I0923 14:17:11.719472 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.450.119.03
I0923 14:17:11.719897 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.450.119.03
I0923 14:17:11.720133 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.450.119.03
I0923 14:17:11.720217 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.450.119.03
I0923 14:17:11.720300 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.450.119.03
I0923 14:17:11.720380 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.450.119.03
W0923 14:17:11.720453 3265 nvc_info.c:350] missing library libnvidia-nscq.so
W0923 14:17:11.720469 3265 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W0923 14:17:11.720491 3265 nvc_info.c:354] missing compat32 library libnvidia-ml.so
W0923 14:17:11.720501 3265 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W0923 14:17:11.720516 3265 nvc_info.c:354] missing compat32 library libnvidia-nscq.so
W0923 14:17:11.720534 3265 nvc_info.c:354] missing compat32 library libcuda.so
W0923 14:17:11.720540 3265 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
W0923 14:17:11.720547 3265 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
W0923 14:17:11.720561 3265 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W0923 14:17:11.720567 3265 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W0923 14:17:11.720582 3265 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
W0923 14:17:11.720600 3265 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W0923 14:17:11.720612 3265 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
W0923 14:17:11.720628 3265 nvc_info.c:354] missing compat32 library libnvidia-encode.so
W0923 14:17:11.720642 3265 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
W0923 14:17:11.720658 3265 nvc_info.c:354] missing compat32 library libnvcuvid.so
W0923 14:17:11.720667 3265 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so
W0923 14:17:11.720674 3265 nvc_info.c:354] missing compat32 library libnvidia-glcore.so
W0923 14:17:11.720686 3265 nvc_info.c:354] missing compat32 library libnvidia-tls.so
W0923 14:17:11.720693 3265 nvc_info.c:354] missing compat32 library libnvidia-glsi.so
W0923 14:17:11.720710 3265 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
W0923 14:17:11.720725 3265 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
W0923 14:17:11.720740 3265 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W0923 14:17:11.720755 3265 nvc_info.c:354] missing compat32 library libnvoptix.so
W0923 14:17:11.720774 3265 nvc_info.c:354] missing compat32 library libGLX_nvidia.so
W0923 14:17:11.720792 3265 nvc_info.c:354] missing compat32 library libEGL_nvidia.so
W0923 14:17:11.720799 3265 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so
W0923 14:17:11.720807 3265 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so
W0923 14:17:11.720818 3265 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so
W0923 14:17:11.720823 3265 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I0923 14:17:11.721761 3265 nvc_info.c:276] selecting /usr/bin/nvidia-smi
I0923 14:17:11.721799 3265 nvc_info.c:276] selecting /usr/bin/nvidia-debugdump
I0923 14:17:11.721837 3265 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
I0923 14:17:11.721887 3265 nvc_info.c:276] selecting /usr/bin/nv-fabricmanager
I0923 14:17:11.721930 3265 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-control
I0923 14:17:11.721972 3265 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-server
I0923 14:17:11.722021 3265 nvc_info.c:438] listing device /dev/nvidiactl
I0923 14:17:11.722043 3265 nvc_info.c:438] listing device /dev/nvidia-uvm
I0923 14:17:11.722049 3265 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I0923 14:17:11.722056 3265 nvc_info.c:438] listing device /dev/nvidia-modeset
W0923 14:17:11.722103 3265 nvc_info.c:321] missing ipc /var/run/nvidia-persistenced/socket
W0923 14:17:11.722155 3265 nvc_info.c:321] missing ipc /var/run/nvidia-fabricmanager/socket
W0923 14:17:11.722193 3265 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I0923 14:17:11.722213 3265 nvc_info.c:733] requesting device information with ''
I0923 14:17:11.729066 3265 nvc_info.c:623] listing device /dev/nvidia0 (GPU-f82fe76f-d403-d34b-8b80-9a9316b19b18 at 00000000:00:1e.0)
NVRM version:   450.119.03
CUDA version:   11.0

Device Index:   0
Device Minor:   0
Model:          Tesla K80
Brand:          Tesla
GPU UUID:       GPU-f82fe76f-d403-d34b-8b80-9a9316b19b18
Bus Location:   00000000:00:1e.0
Architecture:   3.7
I0923 14:17:11.729156 3265 nvc.c:423] shutting down library context
I0923 14:17:11.729661 3267 driver.c:163] terminating driver service
I0923 14:17:11.730042 3265 driver.c:203] driver service terminated successfully

Looks like the builder is not using the nvidia runtime. What am I missing?

@klueska
Copy link
Contributor

klueska commented Sep 23, 2021

It's because normal docker build uses whatever default-runtime you have set in your /etc/docker/daemon.json during the build steps and buildkit does not. I'm assuming you have the default-runtime set to nvidia in your case, which will make sure to dynamically inject the /dev/nvidia* devices as well as various library files and the nvidia-smi utility (these are not (and cannot be) part of the image itself).

In general, it is not recommended to build containers with nvidia set as the default-runtime for exactly this reason. You end up with files injected into your container during the build process that may or may not be present when you actually run the container in the future (especially if you run on a different version of the nvidia driver because some of the libs that get dynamically injected are driver specific).

@rmax
Copy link
Author

rmax commented Sep 23, 2021

@klueska indeed, I have a default-runtime set to nvidia and my use case is building a tensorflow op, testing it, and then using it in the final image (a multi-stage dockerfile).

Is it possible to set default-runtime for buildkit builders?

@KernelA
Copy link

KernelA commented Jan 13, 2022

I have the same issue. In some cases, it is useful to have nvidia-runtime at build stage.

@elezar
Copy link
Member

elezar commented Feb 10, 2022

Would the following be useful: moby/buildkit#1283

@chris-volley
Copy link

Anyone have a work around or example of this? I am running into the same issue.

@KernelA
Copy link

KernelA commented Jul 25, 2022

@chris-volley, only disable buildkit.

@unphased
Copy link

unphased commented Sep 27, 2022

I need to use buildkit to leverage cache mount for an nvidia based docker build! Is there no way to do this? That can't be!?

@Lucas-Steinmann
Copy link

That is my state of work currently. I wrote an in-house build system around docker, which selectively turns BuildKit on and off. Parts, which have to be compiled against CUDA libraries and tested with the GPU capabbilities are in separate Dockerfiles. These Dockerfiles are built with BuidlKit turned off.

Other parts have to authenticate against our servers to download some data or code and want to mount secrets. These Dockerfiles are called with BuildKit turned on.

The dependency information, which specifies the build-order and whether to turn on BuildKit or not, is stored in a central metadata index-file.

Janky as hell (number of Dockerfiles grow quickly), but I don't know any other solution. It's basically a step backward from multi-stage Dockerfiles.

If someone got a better solution, please let me know.

@elezar I was not able to follow the explanations in the pull request and transfer it to this use case.
If someone can explain to me, how I can use --oci-worker-binary to replace the default runtime with the nvidia-runtime, I'd be very thankful.
Anyway, I think the pull request was reverted by now. So I'm not sure if it is relevant anymore. Not sure if there is a replacement.

@elezar
Copy link
Member

elezar commented Sep 28, 2022

As a general question (before I spend more time familiarizing myself with buildkit), would buildx be an option? I see that the config supports a binary option per worker:

  # alternate OCI worker binary name(example 'crun'), by default either 
  # buildkit-runc or runc binary is used
  binary = ""

One could set up two different workers here -- one for the non-GPU builds and one for the cases where the drivers are required at build time.

@unphased
Copy link

@Lucas-Steinmann Thanks for explaining.

My situation involves a cmake project that definitely absolutely needs cuda for anything nontrivial that it builds.

So far it seems like a practical way to go to work around the buildkit/nvidia blockage to leverage ccache is with this rsync trick.

@azmathmoosa
Copy link

was anyone able to get this working? unable to find a single example of how to set --oci-binary-worker through buildx to /usr/bin/nvidia-container-runtime

@Lucas-Steinmann
Copy link

As a general question (before I spend more time familiarizing myself with buildkit), would buildx be an option? I see that the config supports a binary option per worker:

  # alternate OCI worker binary name(example 'crun'), by default either 
  # buildkit-runc or runc binary is used
  binary = ""

One could set up two different workers here -- one for the non-GPU builds and one for the cases where the drivers are required at build time.

As a general question (before I spend more time familiarizing myself with buildkit), would buildx be an option? I see that the config supports a binary option per worker:

  # alternate OCI worker binary name(example 'crun'), by default either 
  # buildkit-runc or runc binary is used
  binary = ""

One could set up two different workers here -- one for the non-GPU builds and one for the cases where the drivers are required at build time.

I tried this. The obvious problem is that the nvidia-container-runtime will not be present in the builder which buildx created.
For our problem, the additional complexity is added, that we use jetson edge devices and I according to nvidia-support in the developer forum, the container must be based on the l4t-base container.

If I understand correctly, one would have to create a version of moby/buildkit:buildx-stable-1 which is based on l4t-base. But I don't understand buildx / l4t etc. enough to do this. Took me already as long to understand that nvidia-container-runtime only accepts l4t-base images and just executes other images without mounting host files (and probably compute cababilities).

@elezar elezar transferred this issue from NVIDIA/nvidia-container-runtime Oct 20, 2023
@zhanwenchen
Copy link

I figured out a workaround for this: https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime/77348905#77348905. I had to disable/remove buildkit.

@Liquidmasl
Copy link

Liquidmasl commented Nov 9, 2023

Just want to throw in another opinion here, sometimes nvidia is absolutely needed in building dependencies.
Yes, that might mean the container wont run on some machines, but that is fine. Some images need specific hardware and drivers. In this case, it needs a nvidia gpu and its drivers!
Currently the workaround is to switch off buildkit (via environment variable, because thats the only functional way right now, the feature bool in the deamon.yml does nothing).
But this already gives a deprecation warning, if the lagacy builder is phased out without buildkit being able do build with nvidia runtime a lot of people will have a very bad time.

Also generally I think its not a great thing that buildkit silently ignores the runtime settings.
I troubleshooted for days to find out why suddenly all my containers failed to run on the gpu.

Edit: I just notices this is the nvidia toolkit git and not docker, this was a rant against docker, not nvidia, oopsie

@pktiuk
Copy link

pktiuk commented Feb 1, 2024

Is there any related Issue in the Docker issue tracker?

@elezar
Copy link
Member

elezar commented Feb 1, 2024

@pktiuk there is moby/buildkit#4056 that adds CDI support to buildkit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests