Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in finetuning #6

Open
pqviet opened this issue Jul 14, 2022 · 7 comments
Open

Errors in finetuning #6

pqviet opened this issue Jul 14, 2022 · 7 comments

Comments

@pqviet
Copy link

pqviet commented Jul 14, 2022

After completing pre-training, I finetuned to refcoco-unc and found the following error messages
File "SeqTR/seqtr/utils/checkpoint.py", line 57, in load_pretrained_checkpoint
state, ema_state = ckpt['state_dict'], ckpt['ema_state_dict']
KeyError: 'ema_state_dict'
Even after fixing this bug, I still found many bugs (e.g. lan_enc.embedding.weight, model.head) in load_pretrained_checkpoint().
Can you please check it?

@seanzhuh
Copy link
Owner

Hi, please upload the full traceback. Did it show the lan_enc.embedding.weight does not match the size, pre-training uses a larger word vocabulary, while fine-tuning only need a subset of this vocabulary, since we freeze the embedding weight both for pre-training and fine-tuning, it's ok, don't worry.

@pqviet
Copy link
Author

pqviet commented Jul 19, 2022

After fixing the 'ema_state_dict' keyerror, I got the same error for lan_enc.embedding.weight
KeyError: 'lan_enc.embedding.weight'
I think some keys in the fine-tuned model were not defined in the pretrain model.

@seanzhuh
Copy link
Owner

Did you use DDP during fine-tuning, if that's the case, the keys in pre-trained state_dict need to prepend "module." since we move it in line 58-59. By default we fine-tune on a single GPU card.

@pqviet
Copy link
Author

pqviet commented Jul 20, 2022

No, I didn't use DDP in fine-tuning
python tools/train.py configs/seqtr/detection/seqtr_det_refcoco-unc.py --finetune-from work_dir/seqtr_det_mixed/det_best.pth --cfg-options scheduler_config.max_epoch=5 scheduler_config.decay_steps=[4] scheduler_config.warmup_epochs=0

@CCYChongyanChen
Copy link

CCYChongyanChen commented Nov 3, 2022

Dear Author:
I met the same error. The trackback is attached:

Traceback (most recent call last):
File "tools/train.py", line 183, in
main()
File "tools/train.py", line 179, in main
main_worker(cfg)
File "tools/train.py", line 105, in main_worker
load_pretrained_checkpoint(model, model_ema, cfg.finetune_from, amp=cfg.use_fp16)
File "/home/chch3470/SeqTR/seqtr/utils/checkpoint.py", line 57, in load_pretrained_checkpoint
state, ema_state = ckpt['state_dict'], ckpt['ema_state_dict']
KeyError: 'ema_state_dict'

I am finetuning the segmentation model from the "pre-trained + fine-tuned SeqTR segmentation" on a customized dataset.

(1) I can run inference/test on this pretrained model.
(2)I also can fine-tune the detection model.

Not sure if there is something missing from the segmentation finetune...Could you kindly guide me? Thank you so much!

The script I run is
``python tools/train.py configs/seqtr/segmentation/seqtr_segm_vizwiz.py --finetune-from "/home/chch3470/SeqTR/work_dir/segm_best.pth" --cfg-options scheduler_config.max_epoch=10 scheduler_config.decay_steps=[4] scheduler_config.warmup_epochs=0 "

@seanzhuh
Copy link
Owner

seanzhuh commented Nov 3, 2022 via email

@CCYChongyanChen
Copy link

CCYChongyanChen commented Nov 3, 2022

Thank you for quick reply!
I comment out the lines about ema and it shows error about lan_enc.embedding.weight

Traceback (most recent call last):
File "tools/train.py", line 183, in
main()
File "tools/train.py", line 179, in main
main_worker(cfg)
File "tools/train.py", line 105, in main_worker
load_pretrained_checkpoint(model, model_ema, cfg.finetune_from, amp=cfg.use_fp16)
File "/home/chch3470/SeqTR/seqtr/utils/checkpoint.py", line 61, in load_pretrained_checkpoint
state.pop("lan_enc.embedding.weight")
KeyError: 'lan_enc.embedding.weight'

The seq_embedding_dim key is also missing.
I commented out many lines and it seems to be working. Though not sure if I did it correctly or not
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants