CUDA error, related to multiprocessing #4

cbaakman · 2024-07-07T21:19:33Z

I'm tried to run LightMHC with Cuda 11.8.0

Command:

python LightMHC/lightmhc/inference.py data.input_csv_path=xray.lightmhc.csv data.output_dir=lighmhc-xray model.n_cpus=32 model.use_gpu=true model.batch_size=64

I got an error message:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/baakmanc/LightMHC/lightmhc/inference.py", line 77, in workflow
    model = model.to(device)
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 673, in to
    return self._apply(convert)
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 409, in _apply
    param_applied = fn(param)
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 671, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/baakmanc/miniconda3/envs/lightmhc/lib/python3.8/site-packages/torch/cuda/__init__.py", line 160, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
"""

I found that the start method can be set to 'spawn' by following the instructions here:
https://pytorch.org/docs/stable/notes/multiprocessing.html

That fixed the issue for me. Just letting you know.

antoine-delaunay · 2024-07-08T08:55:44Z

@cbaakman Thanks for spotting this, we will have a look into this and update our code accordingly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error, related to multiprocessing #4

CUDA error, related to multiprocessing #4

cbaakman commented Jul 7, 2024 •

edited

Loading

antoine-delaunay commented Jul 8, 2024

CUDA error, related to multiprocessing #4

CUDA error, related to multiprocessing #4

Comments

cbaakman commented Jul 7, 2024 • edited Loading

antoine-delaunay commented Jul 8, 2024

cbaakman commented Jul 7, 2024 •

edited

Loading