-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1080Ti is out of memory for testing 1024P pretrained model #19
Comments
1080Ti should be able to run the inference perfectly fine; it should only take about 4G memory. Are you sure the GPU is not running something else at the same time? |
I am sure there is no other jobs running at the same time.
docker-compose.yml
Error information:
That's wired! |
I meet similar problem. I solve it by adding proper options. You may need to read the "readme" carefully. |
@tcwang0509 I run the inference code bash ./scripts/test_1024p.sh on my server but it shows error:
I run with TiTan XP and I used an empty GPU for the inference:
|
@tcwang0509 @ArthurQiuu Could you provide any solutions to the problems? Thanks so much!! |
The problem solved when I update the torch version from 0.3.0 to 0.3.1.post2
|
I am running ToT Pytorch and 1024p does not fit in 16G by default for inference (test.py). I have added FP16 option (see my PR) to make it fit. |
I meet the same problem when using a Titan X GPU to test the pre-trained 1024p model. Did anyone solve the out-of-memory problem? @tcwang0509 Is it possible to provide the 512p pre-trained model for testing? Thank you! |
I meet the same problem on 1080ti, I run the program on an empty GPU, it failed, but I can still get two pics. therefore, I try to train my own models, using /scripts/train_512p.sh/ actually, all the other train scripts generate the same issues. the datasets are managed as follows. |
@tcwang0509 I tries different combinations of parameters in the test_1024p.sh, I found that the --ngf highly affect the memory. I also watch the memory composition during running, the training of 512 may only use about 4Gb, however, the testing will eat much more. Reduce the number of --ngf to 20 can make sure the testing but the quality of images are very strange. I tested on both 1080ti and titan x. |
@ouyangkid are you using pytorch 0.4? It seems the problem is due to volatile not supported anymore, so inference costs a lot more memory than it should. Please pull the latest version and see if it works. |
@tcwang0509 Yes, thanks for your response, it seems that the last version will be 1.0, but not publicly available. I will wait and try after they published the official version. |
@ouyangkid I got the same error as you "... AttributeError: 'Namespace' object has no attribute 'data_type'". Did you only change the --ngf parameter? I have already tried that and did not work. |
@marioft according to @tcwang0509, the problem is because of the versions of different software, as I tried, reduce the parms of --ngf is one of the operations that can decrease the memory consumptions of the GPUs, however, the outputs are wired. I suggest you wait for the new version of pytorch 1.0 / tensorrt. As you can see, the nvidia has only one guy support on this project currently, I also give up any test. |
Thanks for your reply, I'll update the software then and hope it works. I'm working with Cuda7.5, cudnn7.1.3, tensorrt 4.0.1, and pytorch 0.4.0. |
I ran the code with default any insight thanks in advance |
Hi @nejyeah I am trying to run pix2pixHD using a Docker container. I user your Dockerfile, but this line
raise an error:
Can you help me dockerize pix2pixHD? |
@fabio-C Sorry, I did not keep the dockerfile and the docker image. |
If you're using pyTorch 1.0.0, you'll also get a CUDA out of memory error. You'll want to find line 214 in pix2pixHD_model.py and comment out
And replace it with just
Or your own, improved, pyTorch version-detecting code. |
@9of9's solution worked for me (Thanks !). I noted one interesting thing though, if I pass --resize_or_crop none, then I don't get out of memory ( although the output images don't make sense ). OOM occurs only when --resize_or_crop == scale_width |
So could you offer 512p prtrained model for testing?
The text was updated successfully, but these errors were encountered: