Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some problem about auto allocate GPU card #25

Open
guobingithub opened this issue May 20, 2020 · 1 comment
Open

some problem about auto allocate GPU card #25

guobingithub opened this issue May 20, 2020 · 1 comment

Comments

@guobingithub
Copy link

hello, my gpu server which has 4 gpu cards(every one has 7611MiB),
now three containers run on the card gpu0, they total used 7601MiB.
then i run a new container, as expect this new container will run on gpu1 or gpu2 or gpu3.
but it does not run on gpu1/gpu2/gpu3 at all!!! Actualy it run failed!(CrashLoopBackOff)!
root@server:~# root@server:~# kubectl get po NAME READY STATUS RESTARTS AGE binpack-1-5cb847f945-7dp5g 1/1 Running 0 3h33m binpack-2-7fb6b969f-s2fmh 1/1 Running 0 64m binpack-3-84d8979f89-d6929 1/1 Running 0 59m binpack-4-669844dd5f-q9wvm 0/1 **CrashLoopBackOff** 15 56m ngx-dep1-69c964c4b5-9d7cp 1/1 Running 0 102m root@server:~# root@server:~#

my gpu server info:
`root@server:~# nvidia-smi
Wed May 20 18:18:17 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:18:00.0 Off | 0 |
| N/A 65C P0 25W / 75W | 7601MiB / 7611MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P4 Off | 00000000:3B:00.0 Off | 0 |
| N/A 35C P8 6W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P4 Off | 00000000:5E:00.0 Off | 0 |
| N/A 32C P8 6W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P4 Off | 00000000:86:00.0 Off | 0 |
| N/A 38C P8 7W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 24689 C python 7227MiB |
| 0 45236 C python 151MiB |
| 0 47646 C python 213MiB |
+-----------------------------------------------------------------------------+
root@server:#
root@server:
#`

and my binpack-4.yaml info is below:
`root@server:/home/guobin/gpu-repo# cat binpack-4.yaml
apiVersion: apps/v1
kind: Deployment

metadata:
name: binpack-4
labels:
app: binpack-4

spec:
replicas: 1

selector: # define how the deployment finds the pods it manages
matchLabels:
app: binpack-4

template: # define the pods specifications
metadata:
labels:
app: binpack-4

spec:
  containers:
  - name: binpack-4
    image: cheyang/gpu-player:v2
    resources:
      limits:
        # MiB
        aliyun.com/gpu-mem: 200`

as you can see, the aliyun.com/gpu-mem is 200MiB.

ok! these are all important info. Why this plugin can not auto allocate GPU card?
or is there something i need to modify?

Thanks for your help!

@guobingithub
Copy link
Author

@cheyang can you give me a help ? thanks very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant