Skip to content
This repository has been archived by the owner on Jul 30, 2024. It is now read-only.

Running out of GPU memory #9

Open
tobysharp opened this issue May 1, 2020 · 4 comments
Open

Running out of GPU memory #9

tobysharp opened this issue May 1, 2020 · 4 comments

Comments

@tobysharp
Copy link

python train_nerf.py --config config/lego.yml

On a Windows machine with an nVidia GeForce 2080 Ti:

[TRAIN] Iter: 0 Loss: 0.23798935115337372 PSNR: 6.234424750392607
[VAL] =======> Iter: 0
  0%|                                                                                       | 0/200000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train_nerf.py", line 404, in <module>
    main()
  File "train_nerf.py", line 336, in main
    encode_direction_fn=encode_direction_fn,
  File "D:\dev\nerf\nerf\train_utils.py", line 180, in run_one_iter_of_nerf
    for batch in batches
  File "D:\dev\nerf\nerf\train_utils.py", line 180, in <listcomp>
    for batch in batches
  File "D:\dev\nerf\nerf\train_utils.py", line 115, in predict_and_render_radiance
    encode_direction_fn,
  File "D:\dev\nerf\nerf\train_utils.py", line 11, in run_network
    embedded = embed_fn(pts_flat)
  File "D:\dev\nerf\nerf\nerf_helpers.py", line 166, in <lambda>
    x, num_encoding_functions, include_input, log_sampling
  File "D:\dev\nerf\nerf\nerf_helpers.py", line 157, in positional_encoding
    return torch.cat(encoding, dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 0; 11.00 GiB total capacity; 4.49 GiB already allocated; 2.81 GiB free; 5.88 GiB reserved in total by PyTorch)
@krrish94
Copy link
Owner

krrish94 commented May 1, 2020

On a 11GB GPU, I'd recommend lowering the chunksize parameters (in the lego.yml config file to about 8192 (here and here). Also, I'd reduce the number of layers in the neural net to about 4 for a start.

@holzers
Copy link

holzers commented May 1, 2020

It seems quite a bit of your GPU memory is already allocated. Have you tried nvidia-smi to see where it is allocated? Maybe check if you running another instance of python where you run some training or where GPU memory is allocated.

I am using a 1080 with only 8GB and haven't had any problems with default settings in the original nerf repo.

@eshafeeqe
Copy link

Hello,
I come across the same problem, attaching the error text below.

Traceback (most recent call last):
  File "train_nerf.py", line 404, in <module>
    main()
  File "train_nerf.py", line 336, in main
    encode_direction_fn=encode_direction_fn,
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 180, in run_one_iter_of_nerf
    for batch in batches
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 180, in <listcomp>
    for batch in batches
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 115, in predict_and_render_radiance
    encode_direction_fn,
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 11, in run_network
    embedded = embed_fn(pts_flat)
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/nerf_helpers.py", line 166, in <lambda>
    x, num_encoding_functions, include_input, log_sampling
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/nerf_helpers.py", line 157, in positional_encoding
    return torch.cat(encoding, dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 0; 7.94 GiB total capacity; 4.49 GiB already allocated; 1.20 GiB free; 5.88 GiB reserved in total by PyTorch)

My nvidia-smi output

Wed Jun  3 12:24:55 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980M    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P8     8W /  N/A |    421MiB /  8126MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1092      G   /usr/lib/xorg/Xorg                           198MiB |
|    0      2125      G   compiz                                       108MiB |
|    0      2809      G   ...quest-channel-token=4477776435151191749   108MiB |
+-----------------------------------------------------------------------------+

@eshafeeqe
Copy link

I reduced the chunck size as recommended, its started working now. I am using 8GB graphics card (GTX 980).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants