Running out of GPU memory #9

tobysharp · 2020-05-01T18:24:52Z

python train_nerf.py --config config/lego.yml

On a Windows machine with an nVidia GeForce 2080 Ti:

[TRAIN] Iter: 0 Loss: 0.23798935115337372 PSNR: 6.234424750392607
[VAL] =======> Iter: 0
  0%|                                                                                       | 0/200000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train_nerf.py", line 404, in <module>
    main()
  File "train_nerf.py", line 336, in main
    encode_direction_fn=encode_direction_fn,
  File "D:\dev\nerf\nerf\train_utils.py", line 180, in run_one_iter_of_nerf
    for batch in batches
  File "D:\dev\nerf\nerf\train_utils.py", line 180, in <listcomp>
    for batch in batches
  File "D:\dev\nerf\nerf\train_utils.py", line 115, in predict_and_render_radiance
    encode_direction_fn,
  File "D:\dev\nerf\nerf\train_utils.py", line 11, in run_network
    embedded = embed_fn(pts_flat)
  File "D:\dev\nerf\nerf\nerf_helpers.py", line 166, in <lambda>
    x, num_encoding_functions, include_input, log_sampling
  File "D:\dev\nerf\nerf\nerf_helpers.py", line 157, in positional_encoding
    return torch.cat(encoding, dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 0; 11.00 GiB total capacity; 4.49 GiB already allocated; 2.81 GiB free; 5.88 GiB reserved in total by PyTorch)

The text was updated successfully, but these errors were encountered:

krrish94 · 2020-05-01T18:39:00Z

On a 11GB GPU, I'd recommend lowering the chunksize parameters (in the lego.yml config file to about 8192 (here and here). Also, I'd reduce the number of layers in the neural net to about 4 for a start.

holzers · 2020-05-01T18:53:07Z

It seems quite a bit of your GPU memory is already allocated. Have you tried nvidia-smi to see where it is allocated? Maybe check if you running another instance of python where you run some training or where GPU memory is allocated.

I am using a 1080 with only 8GB and haven't had any problems with default settings in the original nerf repo.

eshafeeqe · 2020-06-03T02:28:55Z

Hello,
I come across the same problem, attaching the error text below.

Traceback (most recent call last):
  File "train_nerf.py", line 404, in <module>
    main()
  File "train_nerf.py", line 336, in main
    encode_direction_fn=encode_direction_fn,
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 180, in run_one_iter_of_nerf
    for batch in batches
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 180, in <listcomp>
    for batch in batches
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 115, in predict_and_render_radiance
    encode_direction_fn,
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 11, in run_network
    embedded = embed_fn(pts_flat)
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/nerf_helpers.py", line 166, in <lambda>
    x, num_encoding_functions, include_input, log_sampling
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/nerf_helpers.py", line 157, in positional_encoding
    return torch.cat(encoding, dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 0; 7.94 GiB total capacity; 4.49 GiB already allocated; 1.20 GiB free; 5.88 GiB reserved in total by PyTorch)

My nvidia-smi output

Wed Jun  3 12:24:55 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980M    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P8     8W /  N/A |    421MiB /  8126MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1092      G   /usr/lib/xorg/Xorg                           198MiB |
|    0      2125      G   compiz                                       108MiB |
|    0      2809      G   ...quest-channel-token=4477776435151191749   108MiB |
+-----------------------------------------------------------------------------+

eshafeeqe · 2020-06-03T02:39:26Z

I reduced the chunck size as recommended, its started working now. I am using 8GB graphics card (GTX 980).

krrish94 mentioned this issue Dec 15, 2020

RuntimeError: CUDA error: device-side assert triggered #22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running out of GPU memory #9

Running out of GPU memory #9

tobysharp commented May 1, 2020

krrish94 commented May 1, 2020

holzers commented May 1, 2020

eshafeeqe commented Jun 3, 2020

eshafeeqe commented Jun 3, 2020

Running out of GPU memory #9

Running out of GPU memory #9

Comments

tobysharp commented May 1, 2020

krrish94 commented May 1, 2020

holzers commented May 1, 2020

eshafeeqe commented Jun 3, 2020

eshafeeqe commented Jun 3, 2020