-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Branch
main branch (mmpretrain version)
Describe the bug
I am training a binary classification model to predict whether an image belongs to class 1. For this, I use the following loss function configuration: type='CrossEntropyLoss', use_sigmoid=True.
Below is the full model configuration:
model = dict(
backbone=dict(
depth=50,
num_stages=4,
out_indices=(3,),
style='pytorch',
type='ResNet'),
head=dict(
in_channels=2048,
loss=dict(
loss_weight=1.0,
type='CrossEntropyLoss',
use_sigmoid=True
),
num_classes=1,
topk=(1,),
type='LinearClsHead'),
neck=dict(type='GlobalAveragePooling'),
type='ImageClassifier'
)The dataset structure is as follows:
dataset/
0/
<images>
1/
<images>
When starting the training process, I encounter an error related to mismatched dimensions between predictions and labels:
File ".../models/losses/cross_entropy_loss.py", line 207, in forward
loss_cls = self.loss_weight * self.cls_criterion(
File ".../models/losses/cross_entropy_loss.py", line 116, in binary_cross_entropy
assert pred.dim() == label.dim()
Here are the dimensions of the predictions and labels printed during debugging:
torch.Size([32, 1])
tensor([[0.3662],
[0.2847],
[0.3814],
[0.4056],
[0.3276],
[0.3755],
[0.3956],
[0.3937],
[0.2132],
[0.5007],
[0.3490],
[0.1789],
[0.2947],
[0.1428],
[0.4092],
[0.4081],
[0.1799],
[0.3192],
[0.1937],
[0.4014],
[0.3862],
[0.1030],
[0.5187],
[0.4729],
[0.5233],
[0.3962],
[0.5855],
[0.3272],
[0.2554],
[0.3628],
[0.2230],
[0.3537]], device='cuda:0', grad_fn=<AddmmBackward0>)
torch.Size([32])
tensor([1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0,
1, 0, 1, 1, 1, 1, 0, 1], device='cuda:0')
It appears that pred has the shape (32, 1), while label has the shape (32,). This causes the assertion assert pred.dim() == label.dim() in the binary_cross_entropy function to fail.
When I disable the sigmoid (use_sigmoid=False) and set num_classes=2, the training works without errors. However, the issue persists when using use_sigmoid=True and num_classes=1.
Is this a bug in the implementation of CrossEntropyLoss with use_sigmoid=True, or am I misconfiguring something? Any clarification or suggestions would be greatly appreciated.
Environment
{'sys.platform': 'linux',
'Python': '3.8.20 (default, Oct 3 2024, 15:24:27) [GCC 11.2.0]',
'CUDA available': True,
'MUSA available': False,
'numpy_random_seed': 2147483648,
'GPU 0': 'NVIDIA GeForce RTX 3060 Laptop GPU',
'CUDA_HOME': None,
'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0',
'PyTorch': '1.10.1',
'TorchVision': '0.11.2',
'OpenCV': '4.5.5',
'MMEngine': '0.10.6',
'MMCV': '2.2.0',
'MMPreTrain': '1.2.0+ee7f2e8'}Other information
No response