[bug] how to train fine-tuning classification model (size mismatch for head.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([4]).) #297

jeonga0303 · 2024-06-05T06:21:31Z

I customized the config.py.
how to train fine-tuning classification model?

jeonga0303 · 2024-06-10T01:57:09Z

how to convert Nc1..?

jeonga0303 · 2024-06-10T02:29:35Z

I tried changing the file configuration in the following order.

I'm training, but the data is big, so I'll let you know the results in the future

download pth file
config.py

_C.DATA.IMG_SIZE = 224
_C.MODEL.PRETRAINED = 'internimage_b_1k_224.pth'
_C.MODEL.NUM_CLASSES = 4

util.py ( Nc1: 1000 > Nc2: 4 )
convert load_pretrained function.

    if 'head.bias' in state_dict:
        head_bias_pretrained = state_dict['head.bias']
        Nc1 = head_bias_pretrained.shape[0]
        Nc2 = model.head.bias.shape[0]
        logger.info(f'{Nc1}, {Nc2}')
        if (Nc1 != Nc2):
            # head_weight = model.head.weight
            # head_bias = model.head.bias
            model.head.weight = torch.nn.Parameter(torch.zeros_like(model.head.weight))
            model.head.bias = torch.nn.Parameter(torch.zeros_like(model.head.bias))
            state_dict.pop('head.weight', None)
            state_dict.pop('head.bias', None)

dataset/samplers.py
convert iteration.

def __iter__(self):
        # deterministically shuffle based on epoch
        g = torch.Generator()
        g.manual_seed(self.epoch)

        t = torch.Generator()
        t.manual_seed(0)

        indices = torch.randperm(len(self.dataset), generator=t).tolist()
        indices = [i for i in indices if i % self.num_parts == self.rank]

        # add extra samples to make it evenly divisible
        while len(indices) < self.total_size_parts:
            indices += indices[:(self.total_size_parts - len(indices))]
        
        indices = indices[:self.total_size_parts]
        assert len(indices) == self.total_size_parts, f'Length of indices ({len(indices)}) does not match total_size_parts ({self.total_size_parts})'

        # subsample
        indices = indices[self.rank // self.num_parts:self.total_size_parts:self.num_replicas // self.num_parts]

        index = torch.randperm(len(indices), generator=g).tolist()
        indices = list(np.array(indices)[index])

        assert len(indices) == self.num_samples, f'Length of indices ({len(indices)}) does not match num_samples ({self.num_samples})'

        return iter(indices)

cmd
python -m torch.distributed.launch --nproc_per_node 2 --master_port 12345 main.py --cfg configs/without_lr_decay/internimage_b_1k_224_custom.yaml --data-path [data-path] --pretrained internimage_b_1k_224.pth --batch-size 120

my gpu is a100 * 2.

If you use a huge dataset, use the following command
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --cfg configs/without_lr_decay/internimage_b_1k_224_custom.yaml --batch-size 256 --accumulation-steps 4 --pretrained internimage_b_1k_224.pth --data-path [data-path] --local-rank 1 --output work_dirs

2024.06.11 train success (image classification fine-tuning)

jeonga0303 · 2024-06-11T02:53:14Z

[bug]

I don't think there's progress in training..
Everything's the same as loss
May I know the reason?

jeonga0303 closed this as completed Jun 9, 2024

jeonga0303 reopened this Jun 10, 2024

jeonga0303 closed this as completed Jun 11, 2024

jeonga0303 reopened this Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] how to train fine-tuning classification model (size mismatch for head.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([4]).) #297

[bug] how to train fine-tuning classification model (size mismatch for head.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([4]).) #297

jeonga0303 commented Jun 5, 2024

jeonga0303 commented Jun 10, 2024 •

edited

Loading

jeonga0303 commented Jun 10, 2024 •

edited

Loading

jeonga0303 commented Jun 11, 2024

[bug] how to train fine-tuning classification model (size mismatch for head.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([4]).) #297

[bug] how to train fine-tuning classification model (size mismatch for head.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([4]).) #297

Comments

jeonga0303 commented Jun 5, 2024

jeonga0303 commented Jun 10, 2024 • edited Loading

jeonga0303 commented Jun 10, 2024 • edited Loading

jeonga0303 commented Jun 11, 2024

jeonga0303 commented Jun 10, 2024 •

edited

Loading

jeonga0303 commented Jun 10, 2024 •

edited

Loading