You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello, my gpu server which has 4 gpu cards(every one has 7611MiB),
now three containers run on the card gpu0, they total used 7601MiB.
then i run a new container, as expect this new container will run on gpu1 or gpu2 or gpu3.
but it does not run on gpu1/gpu2/gpu3 at all!!! Actualy it run failed!(CrashLoopBackOff)! root@server:~# root@server:~# kubectl get po NAME READY STATUS RESTARTS AGE binpack-1-5cb847f945-7dp5g 1/1 Running 0 3h33m binpack-2-7fb6b969f-s2fmh 1/1 Running 0 64m binpack-3-84d8979f89-d6929 1/1 Running 0 59m binpack-4-669844dd5f-q9wvm 0/1 **CrashLoopBackOff** 15 56m ngx-dep1-69c964c4b5-9d7cp 1/1 Running 0 102m root@server:~# root@server:~#
my gpu server info:
`root@server:~# nvidia-smi
Wed May 20 18:18:17 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:18:00.0 Off | 0 |
| N/A 65C P0 25W / 75W | 7601MiB / 7611MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P4 Off | 00000000:3B:00.0 Off | 0 |
| N/A 35C P8 6W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P4 Off | 00000000:5E:00.0 Off | 0 |
| N/A 32C P8 6W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P4 Off | 00000000:86:00.0 Off | 0 |
| N/A 38C P8 7W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 24689 C python 7227MiB |
| 0 45236 C python 151MiB |
| 0 47646 C python 213MiB |
+-----------------------------------------------------------------------------+
root@server:#
root@server:#`
and my binpack-4.yaml info is below:
`root@server:/home/guobin/gpu-repo# cat binpack-4.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: binpack-4
labels:
app: binpack-4
spec:
replicas: 1
selector: # define how the deployment finds the pods it manages
matchLabels:
app: binpack-4
template: # define the pods specifications
metadata:
labels:
app: binpack-4
hello, my gpu server which has 4 gpu cards(every one has 7611MiB),
now three containers run on the card gpu0, they total used 7601MiB.
then i run a new container, as expect this new container will run on gpu1 or gpu2 or gpu3.
but it does not run on gpu1/gpu2/gpu3 at all!!! Actualy it run failed!(CrashLoopBackOff)!
root@server:~# root@server:~# kubectl get po NAME READY STATUS RESTARTS AGE binpack-1-5cb847f945-7dp5g 1/1 Running 0 3h33m binpack-2-7fb6b969f-s2fmh 1/1 Running 0 64m binpack-3-84d8979f89-d6929 1/1 Running 0 59m binpack-4-669844dd5f-q9wvm 0/1 **CrashLoopBackOff** 15 56m ngx-dep1-69c964c4b5-9d7cp 1/1 Running 0 102m root@server:~# root@server:~#
my gpu server info:
`root@server:~# nvidia-smi
Wed May 20 18:18:17 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:18:00.0 Off | 0 |
| N/A 65C P0 25W / 75W | 7601MiB / 7611MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P4 Off | 00000000:3B:00.0 Off | 0 |
| N/A 35C P8 6W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P4 Off | 00000000:5E:00.0 Off | 0 |
| N/A 32C P8 6W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P4 Off | 00000000:86:00.0 Off | 0 |
| N/A 38C P8 7W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 24689 C python 7227MiB |
| 0 45236 C python 151MiB |
| 0 47646 C python 213MiB |
+-----------------------------------------------------------------------------+
root@server:
##`root@server:
and my binpack-4.yaml info is below:
`root@server:/home/guobin/gpu-repo# cat binpack-4.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: binpack-4
labels:
app: binpack-4
spec:
replicas: 1
selector: # define how the deployment finds the pods it manages
matchLabels:
app: binpack-4
template: # define the pods specifications
metadata:
labels:
app: binpack-4
as you can see, the aliyun.com/gpu-mem is 200MiB.
ok! these are all important info. Why this plugin can not auto allocate GPU card?
or is there something i need to modify?
Thanks for your help!
The text was updated successfully, but these errors were encountered: