Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: device-side assert triggered #54

Open
betty-zeng opened this issue Jun 22, 2023 · 3 comments
Open

RuntimeError: CUDA error: device-side assert triggered #54

betty-zeng opened this issue Jun 22, 2023 · 3 comments

Comments

@betty-zeng
Copy link

Hello,

I'm training the code on docker using pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel as base image with python 3.8. Setup worked fine until I tried to train the code, then these error came out:

/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [33,0,0], thread: [57,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [16,0,0], thread: [52,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [18,0,0], thread: [0,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [16,0,0], thread: [93,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [75,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [87,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [99,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [110,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [126,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [18,0,0], thread: [92,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. epochs: 0%| | 0/6 [00:13<?, ?it/s] Traceback (most recent call last): File "Models/SFD/tools/train.py", line 212, in <module> main() File "Models/SFD/tools/train.py", line 167, in main train_model( File "/workspace/Models/SFD/tools/train_utils/train_utils.py", line 86, in train_model accumulated_iter = train_one_epoch( File "/workspace/Models/SFD/tools/train_utils/train_utils.py", line 38, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/workspace/OpenPCDet/pcdet/models/__init__.py", line 44, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/workspace/Models/SFD/pcdet_extensions/models/detectors/sfd.py", line 11, in forward batch_dict = cur_module(batch_dict) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/workspace/Models/SFD/pcdet_extensions/models/roi_heads/sfd_head.py", line 595, in forward self.roicrop3d_gpu(batch_dict, self.model_cfg.ROI_POINT_CROP.POOL_EXTRA_WIDTH) File "/workspace/Models/SFD/pcdet_extensions/models/roi_heads/sfd_head.py", line 554, in roicrop3d_gpu image[total_pts_features[:,7].long(), total_pts_features[:,6].long()] = global_index.to(device=total_pts_features.device) RuntimeError: CUDA error: device-side assert triggered

It seems like there is an index error from the function roicrop3d_gpu. Could you please verify this? I got stuck for a few days already. Thank you!

@HuangLLL123
Copy link

hello,have you solved the problem? and how?

@vacant-ztz
Copy link

hello,have you solved the problem? and how?

@HuangLLL123
Copy link

hello,have you solved the problem? and how?

I have solved the problem according to the method in this webpage #23. May I ask how you solved it? @vacant-ztz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants