Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CuTorch out of memory issue, how to allow system ram usage #730

Closed
imerad opened this issue Mar 19, 2017 · 10 comments
Closed

CuTorch out of memory issue, how to allow system ram usage #730

imerad opened this issue Mar 19, 2017 · 10 comments

Comments

@imerad
Copy link

imerad commented Mar 19, 2017

Hello, I'm trying to run style transfer as implemented in this repository https://github.com/leongatys/NeuralImageSynthesis on my gpu using cuTorch but i can't go beyond a 500x500 image size with my 2GB of gpu memory and running on cpu is way too slow, so i was thinking i could get around this by making the gpu use some system ram which it forbids itself to do.
I know that would be slower but that's still better than nothing.
Does anyone know how i can let the gpu use ram ? I googled a bit and found this https://developer.blender.org/D2056 , but i don't know if it can help and how

Thanks

@hphong591992
Copy link

One approach that I did (and worked) is install LUA5.2 instead of LUAJIT.

@imerad
Copy link
Author

imerad commented Apr 3, 2017

Hello, thanks for the answer
Sorry I only saw your answer today
I tried installing Torch with Lua 5.2 as you recommend but now I'm having issues using hdf5, I get the following error :

/home/michael/distro/install/bin/lua: /home/michael/distro/install/share/lua/5.2/trepl/init.lua:389: /home/michael/distro/install/share/lua/5.2/hdf5/ffi.lua:56: expected align(#) on line 579
stack traceback:
[C]: in function 'error'
/home/michael/distro/install/share/lua/5.2/trepl/init.lua:389: in function 'require'
ComputeActivations.lua:4: in main chunk
[C]: in function 'dofile'
...ael/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?

It seems others have ran into this bug : google-deepmind/torch-hdf5#79
Don't know how to fix it. Maybe using a previous version of gcc would solve this but I'm afraid it might bring up a whole bunch of other bugs
If you can give me a tip on this, that would be great
Thanks

@imerad
Copy link
Author

imerad commented Apr 4, 2017

Hello again
I solved the previous issue by uninstalling and reinstalling Torch and lua (radical huh ?) but unfortunately I still get the Out of Memory error when running image synthesis for slightly big images
Here's the error again :

THCudaCheck FAIL file=/home/michael/distro/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/home/michael/distro/install/bin/luajit: /home/michael/distro/install/share/lua/5.1/optim/lbfgs.lua:85: cuda runtime error (2) : out of memory at /home/michael/distro/extra/cutorch/lib/THC/generic/THCStorage.cu:66

Notice it ran the luajit binary, it shouldn't should it ? I had cleaned then installed torch again with TORCH_LUA_VERSION=52 though ...
Would be thankful for any clues

@hphong591992
Copy link

I think you should remove the whole torch folder before re-installing it again.

Did you install Lua somewhere else as well?

@imerad
Copy link
Author

imerad commented Apr 5, 2017

I removed the whole torch folder and tried typing "lua" in a terminal, I found another binary in usr/bin by typing "which lua" and deleted it
Then I cloned the repository again and ran the installation with TORCH_LUA_VERSION=LUA52 (as opposed to TORCH_LUA_VERSION=52 , dont know if that was the problem).
After the installation was done I tried running the image synthesis and had an error : couldn't find luarocks package for bit.
So I typed "luarocks install luabitop" in a terminal and that was solved.
But now I'm back with the previous error when I try to run synthesis :

/home/michael/distro/install/bin/lua: /home/michael/distro/install/share/lua/5.2/trepl/init.lua:389: /home/michael/.luarocks/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 579
stack traceback:
[C]: in function 'error'
/home/michael/distro/install/share/lua/5.2/trepl/init.lua:389: in function 'require'
ComputeActivations.lua:4: in main chunk
[C]: in function 'dofile'
...ael/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?

I think when I thought I was rid of it I was simply running luajit ... Don't know how to get past this issue, any ideas ?

@imerad
Copy link
Author

imerad commented Apr 5, 2017

I think I actually solved the problem this time by going back to gcc 4.8 (was on gcc 5.4) and running the installation all over again with TORCH_LUA_VERSION=LUA52
I got the "no luarocks package found for bit" again but "luarocks install luabitop" solved it right away
When I tried running synthesis again I still ran in to the "out of memory" error :

THCudaCheck FAIL file=/home/michael/distro/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/home/michael/distro/install/bin/lua: /home/michael/distro/install/share/lua/5.2/optim/lbfgs.lua:85: cuda runtime error (2) : out of memory at /home/michael/distro/extra/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
[C]: in function 'new'
/home/michael/distro/install/share/lua/5.2/optim/lbfgs.lua:85: in function 'lbfgs'
ImageSynthesis.lua:202: in function 'main'
ImageSynthesis.lua:218: in main chunk
[C]: in function 'dofile'
...ael/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?

This time it seems the lua binary is correctly used instead of luajit
Maybe if I try installing with LUA51 it might work ? I don't think so and I think I won't try it unless you say I should
Could you tell me how much gpu memory you have ? More details about the approach you did ?
Thanks

@hphong591992
Copy link

I didn't do image synthesis. I just merged CNN (VGG 16)-LSTM into one model (2-hidden layers, hidden size is 500), and it takes around 3GB.

@imerad
Copy link
Author

imerad commented Apr 5, 2017

Ok so I suppose you have less than 3GB of gpu memory so that your computer had to use ram, but were you having the same error ? (same as the last one I showed)

@hphong591992
Copy link

No, I use GTX-1080. Most of my recent errors has something to do with the BIOS because when I reset everything to default, it's working normally now.

@imerad
Copy link
Author

imerad commented Apr 5, 2017

I just looked up the GTX-1080 it's got at least 8GB so you probably weren't running out of memory (not because you didn't have enough of it anyway)
Cuda can actually use ram with certain methods but these are seemingly not implemented in cuTorch
It seems there is no simple way to solve this issue, might as well close it

@imerad imerad closed this as completed May 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants