CUDA assertion error binary_cross_entropy loss #9

blancaag · 2018-01-09T10:23:15Z

A CUDA assertion error pops up when setting --no_lsgan. It seems it's because there are negative values thrown into the nn.BCELoss(). Get's fixed applying nn.BCEWithLogitsLoss() instead.

(...)
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/BCECriterion.cu:30: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::tuple<float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [16,0,0], thread: [31,0,0] Assertion `input >= 0. && input <= 1.` failed.
CUDA error after cudaEventDestroy in future dtor: device-side assert triggeredTraceback (most recent call last):
  File "train.py", line 56, in <module>
    Variable(data['image']), Variable(data['feat']), infer=save_fake)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 66, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/blanca/project/wip/pix2pixHD-master/models/pix2pixHD_model.py", line 158, in forward
    loss_D_fake = self.criterionGAN(pred_fake_pool, False)
  File "/blanca/project/wip/pix2pixHD-master/models/networks.py", line 110, in __call__
    loss += self.loss(pred, target_tensor)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 372, in forward
    size_average=self.size_average)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1179, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, size_average)
RuntimeError: cudaEventSynchronize in future::wait: device-side assert triggered
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.c line=184 error=59 : device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
  what():  cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.c:184
Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

Tord-Zhang · 2018-03-03T05:28:54Z

@blancaag I have met the same problem? How did you fix it?

blancaag · 2018-03-06T10:41:34Z

@mangdian I mention it above. Get's fixed applying nn.BCEWithLogitsLoss() instead of nn.BCELoss() in networks.py line 82 --it restricts loss values between 0 and 1 before applying the loss.

aviel08 · 2018-08-15T06:37:33Z

I think I'm having the same issue but only when I use my own dataset. I've tried nn.BCEWithLogitsLoss() but with no luck. It must be related to my data but I can't figure out what I must be missing.

RuntimeError: CUDNN_STATUS_INTERNAL_ERROR /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [346,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [347,0,0] AssertionindexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [348,0,0] AssertionindexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [349,0,0] AssertionindexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [350,0,0] AssertionindexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [351,0,0] AssertionindexValue >= 0 && indexValue < tensor.sizes[dim]failed. THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/generic/THCStorage.c line=184 error=59 : device-side assert triggered terminate called after throwing an instance of 'std::runtime_error' what(): cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/generic/THCStorage.c:184 Aborted (core dumped)

blancaag · 2018-08-17T09:32:08Z

@aviel08 - I think it's a different error and not in the BCELoss - "AssertionindexValue >= 0 && indexValue < tensor.sizes[dim]". I'd suggest to start printing the shape of the input tensors after this line: https://github.com/NVIDIA/pix2pixHD/blob/20687df85d30e6fff5aafb29b7981923da9fd02f/train.py#L51

…

On 15 Aug 2018, at 08:37, Alex Leiva ***@***.***> wrote: ndexValue < tensor.sizes[dim]failed. /opt/conda/conda-

ZhangXiaoying0116 · 2018-10-17T13:28:47Z

@aviel08 I met the same problem,how did you solve it

hfarazi · 2018-11-13T16:23:52Z

You can use torch.clamp(0,1) after your sigmoid layer

relh · 2019-06-12T19:30:47Z

I had to also add:

x = torch.where(torch.isnan(x), torch.zeros_like(x), x)
x = torch.where(torch.isinf(x), torch.zeros_like(x), x)

tongpinmo · 2019-07-17T08:10:51Z

I have applyed nn.BCEWithLogitsLoss() instead of BECLoss(),solve it

izuna385 · 2019-11-29T13:34:00Z

I find that @relh's solution is effective.

>>> torch.nn.functional.sigmoid(torch.tensor(float('nan')))
tensor(nan)

x = torch.where(torch.isnan(x), torch.zeros_like(x), x) prevents this error. Thanks a lot!

andyhahaha mentioned this issue May 28, 2019

？？？ a bug when i training dbolya/yolact#42

Closed

pmixer mentioned this issue Sep 16, 2020

Explanation on why PyTorch 1.6 or above version is required and other info pmixer/SASRec.pytorch#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA assertion error binary_cross_entropy loss #9

CUDA assertion error binary_cross_entropy loss #9

blancaag commented Jan 9, 2018

Tord-Zhang commented Mar 3, 2018

blancaag commented Mar 6, 2018

aviel08 commented Aug 15, 2018

blancaag commented Aug 17, 2018 via email

ZhangXiaoying0116 commented Oct 17, 2018

hfarazi commented Nov 13, 2018

relh commented Jun 12, 2019 •

edited

Loading

tongpinmo commented Jul 17, 2019

izuna385 commented Nov 29, 2019 •

edited

Loading

CUDA assertion error binary_cross_entropy loss #9

CUDA assertion error binary_cross_entropy loss #9

Comments

blancaag commented Jan 9, 2018

Tord-Zhang commented Mar 3, 2018

blancaag commented Mar 6, 2018

aviel08 commented Aug 15, 2018

blancaag commented Aug 17, 2018 via email

ZhangXiaoying0116 commented Oct 17, 2018

hfarazi commented Nov 13, 2018

relh commented Jun 12, 2019 • edited Loading

tongpinmo commented Jul 17, 2019

izuna385 commented Nov 29, 2019 • edited Loading

relh commented Jun 12, 2019 •

edited

Loading

izuna385 commented Nov 29, 2019 •

edited

Loading