The problem when to run experiments using the Docker container #91

Aaricis · 2020-12-18T08:50:43Z

I met the problem when run bash run.sh $GPU python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z.

Launching container named 'zkg_pymarl_GPU_python3_XIdE' on GPU 'python3' docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: python3: unknown device: unknown. ERRO[0001] error waiting for container: context canceled

I don't know how to solve this error.

The text was updated successfully, but these errors were encountered:

mw9385 · 2021-01-14T09:05:58Z

same problem here

reubenwong97 · 2021-01-21T04:54:34Z

Hello, you can refer to this issue: #89. I posted my solution in the comments.

mw9385 · 2021-01-21T04:56:42Z

@reubenwong97 many thanks :)

4ever-Rain · 2021-02-23T03:45:31Z

This issue caused by '$GPU', use some numbers (like: '0', '1') can make this shell work.
However, the CUDA can not work with the docker (nvidia-docker has been installed)
Any ideas of the '$GPU'?
Thanks!

btw, CUDA can work when I run python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z (do not use docker)

reubenwong97 · 2021-02-24T12:28:24Z

@4ever-Rain you may wanna run nvidia-smi to first, check the ids of your GPUs, which you can use in place of $GPU. I had experienced problems when the GPU was low on memory due to other tasks running. You can check if it has available memory with nvidia-smi.

4ever-Rain · 2021-02-25T03:50:40Z

@reubenwong97 Thanks for your advice. I'm sure my GPU is available and free. I have used GPU ids ('0') instead of '$GPU'. But CUDA still not work within docker.
Meanwhile，I'm sure torch.cuda_is_available is True in the docker.
Maybe there is something wrong about 'run.sh'?

FanScy · 2021-03-15T14:33:06Z

@4ever-Rain I encounter the same problem. When I use the cuda() in the container, it will get stuck and can not work. Do you have solved the problem?

4ever-Rain · 2021-03-17T09:27:38Z

@4ever-Rain I encounter the same problem. When I use the cuda() in the container, it will get stuck and can not work. Do you have solved the problem?

Yep. It works for me now.
I finally run the code out of the container by installing all the necessary packages into a conda virtual environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem when to run experiments using the Docker container #91

The problem when to run experiments using the Docker container #91

Aaricis commented Dec 18, 2020

mw9385 commented Jan 14, 2021

reubenwong97 commented Jan 21, 2021

mw9385 commented Jan 21, 2021

4ever-Rain commented Feb 23, 2021 •

edited

Loading

reubenwong97 commented Feb 24, 2021

4ever-Rain commented Feb 25, 2021

FanScy commented Mar 15, 2021

4ever-Rain commented Mar 17, 2021

The problem when to run experiments using the Docker container #91

The problem when to run experiments using the Docker container #91

Comments

Aaricis commented Dec 18, 2020

mw9385 commented Jan 14, 2021

reubenwong97 commented Jan 21, 2021

mw9385 commented Jan 21, 2021

4ever-Rain commented Feb 23, 2021 • edited Loading

reubenwong97 commented Feb 24, 2021

4ever-Rain commented Feb 25, 2021

FanScy commented Mar 15, 2021

4ever-Rain commented Mar 17, 2021

4ever-Rain commented Feb 23, 2021 •

edited

Loading