nvidia-container-cli: device error: unknown device id: no-gpu-has-256MiB-to-run\\\\n\\\"\"": unknown #23

zhaogaolong · 2020-04-30T06:46:46Z

版本信息：

k8s： 1.17
gpushare-device-plugin: v2-1.11-aff8a23
nvidia-smi: 440.36

kubectl descript pod < pod name > -n zhaogaolong
pod errors log

Events:
  Type     Reason     Age                From                      Message
  ----     ------     ----               ----                      -------
  Normal   Scheduled  <unknown>          default-scheduler         Successfully assigned zhaogaolong/gpu-demo-gpushare-659fd6cbb7-6fc8v to gpu-node
  Normal   Pulling    32s (x4 over 70s)  kubelet, gpu-node  Pulling image "hub.xxxx.com/zhaogaolong/gpu-demo.build.build:bccfcbe43f43280d-1584070500-dac37f2c12024544a6cc2871440dc94a577a7ff3"
  Normal   Pulled     32s (x4 over 70s)  kubelet, gpu-node  Successfully pulled image "hub.xxx.com/zhaogaolong/gpu-demo.build.build:bccfcbe43f43280d-1584070500-dac37f2c12024544a6cc2871440dc94a577a7ff3"
  Normal   Created    31s (x4 over 70s)  kubelet, gpu-node  Created container gpu
  Warning  Failed     31s (x4 over 70s)  kubelet, gpu-node  Error: failed to start container "gpu": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: unknown device id: no-gpu-has-256MiB-to-run\\\\n\\\"\"": unknown
  Warning  BackOff    10s (x5 over 68s)  kubelet, ggpu-node  Back-off restarting failed container

相同问题：

NVIDIA/nvidia-docker#1042

@cheyang

The text was updated successfully, but these errors were encountered:

Joseph516 · 2020-07-28T09:52:14Z

Is anybody fix this? I got the same problem here. AliyunContainerService/gpushare-scheduler-extender#120 (comment)

vio-f · 2022-06-23T13:42:49Z

I encountered the same issue today. Can anybody help please?

Lanyujiex · 2022-08-09T08:09:22Z

update your schedule config with gpushare-sch-extender and restart it. you might be able to fix it. @vio-f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvidia-container-cli: device error: unknown device id: no-gpu-has-256MiB-to-run\\\\n\\\"\"": unknown #23

nvidia-container-cli: device error: unknown device id: no-gpu-has-256MiB-to-run\\\\n\\\"\"": unknown #23

zhaogaolong commented Apr 30, 2020 •

edited

Loading

Joseph516 commented Jul 28, 2020

vio-f commented Jun 23, 2022

Lanyujiex commented Aug 9, 2022 •

edited

Loading

nvidia-container-cli: device error: unknown device id: no-gpu-has-256MiB-to-run\\\\n\\\"\"": unknown #23

nvidia-container-cli: device error: unknown device id: no-gpu-has-256MiB-to-run\\\\n\\\"\"": unknown #23

Comments

zhaogaolong commented Apr 30, 2020 • edited Loading

Joseph516 commented Jul 28, 2020

vio-f commented Jun 23, 2022

Lanyujiex commented Aug 9, 2022 • edited Loading

zhaogaolong commented Apr 30, 2020 •

edited

Loading

Lanyujiex commented Aug 9, 2022 •

edited

Loading