RuntimeError: CUDA error: out of memory with 16GB-memory GPU #5

JinChengneng · 2020-09-16T06:24:15Z

Hi there! I am really interested in your repository and thanks for your efforts to latent-gan.

However, I am facing a problem while I am training through the entire process by executing python run.py -sf data/EGFR_training.smi.

The error messages are shown below.

Traceback (most recent call last):
  File "run.py", line 54, in <module>
    runner.run()
  File "run.py", line 32, in run
    decode_mols_save_path=self.decoded_smiles,n_epochs=self.n_epochs,sample_after_training=self.sample_size)
  File "/home/jinchengneng/latent-gan/runners/TrainModelRunner.py", line 69, in __init__
    self.G.cuda()
  File "/opt/anaconda3/envs/latent-gan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 260, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/opt/anaconda3/envs/latent-gan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/opt/anaconda3/envs/latent-gan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/opt/anaconda3/envs/latent-gan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
    param.data = fn(param.data)
  File "/opt/anaconda3/envs/latent-gan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 260, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory

I have monitored the GPU via watch nvidia-smi and the GPU memory usage is very large while loading models. The screenshot is shown below. Once the model started to train models, the program exited with the error RuntimeError: CUDA error: out of memory.

According to my experience, 16GB GPU memory is enough for most programs. I would appreciate it if you would take a look if there are any memory leaks or anything others wrong.

The text was updated successfully, but these errors were encountered:

JinChengneng · 2020-09-17T09:11:02Z

UPDATE:

I solved this problem by adding an extra GPU memory limit. If you are facing the same out of memory problem, you can try to add these codes to the top of file run.py. I hope it may help you.

import tensorflow
tf_config = tensorflow.ConfigProto()  
tf_config.gpu_options.per_process_gpu_memory_fraction = 0.8
session = tensorflow.Session(config=tf_config)

SeemonJ · 2021-03-10T04:25:22Z

Hi,
Sorry for taking so long to get back to you. I am attaching a screenshot of my memory usage while executing python run.py, which has the EGFR training set as the default input. Now, I have commited a few software updates in the past days, but none of them affect the actual system.

I can't seem to reproduce the issue that you are having.
By this time, it might not be relevant for you anymore, but I'd love to hear if there was any further observations you might have made.

Best,
Simon

muammar · 2022-02-08T15:49:33Z

This worked for me. I also needed to select a specific GPU and could do it from bash appending CUDA_VISIBLE_DEVICES=0 before invoking python:

CUDA_VISIBLE_DEVICES=0 python3 run.py --flags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: out of memory with 16GB-memory GPU #5

RuntimeError: CUDA error: out of memory with 16GB-memory GPU #5

JinChengneng commented Sep 16, 2020

JinChengneng commented Sep 17, 2020

SeemonJ commented Mar 10, 2021

muammar commented Feb 8, 2022

RuntimeError: CUDA error: out of memory with 16GB-memory GPU #5

RuntimeError: CUDA error: out of memory with 16GB-memory GPU #5

Comments

JinChengneng commented Sep 16, 2020

JinChengneng commented Sep 17, 2020

SeemonJ commented Mar 10, 2021

muammar commented Feb 8, 2022