Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The reproduction result is not good on the Overall indicator. #10

Open
TracyYannn opened this issue Jun 7, 2024 · 3 comments
Open

The reproduction result is not good on the Overall indicator. #10

TracyYannn opened this issue Jun 7, 2024 · 3 comments

Comments

@TracyYannn
Copy link

The reproduction of the results on Overall is not very good. I ran it on V100, and here are my parameter settings and experimental results. May I ask what the reason is, or how should I reproduce it correctly? Thank you!
python main.py --token_level word-level
--model_type roberta
--model_dir dir_base
--task mixatis
--data_dir data
--attention_mode label
--do_train
--do_eval
--num_train_epochs 100
--intent_loss_coef 0.5
--learning_rate 1e-5
--train_batch_size 32
--num_intent_detection
--use_crf

python main.py --token_level word-level
--model_type roberta
--model_dir misca
--task mixatis
--data_dir data
--attention_mode label
--do_train
--do_eval
--num_train_epochs 100
--intent_loss_coef 0.5
--learning_rate 1e-5
--num_intent_detection
--use_crf \
--base_model dir_base
--intent_slot_attn_type coattention
not_good_overall

@BillKiller
Copy link

I can not reproduce performance too. I hope author can provide more detail information. Same issue issue

@thinhphp
Copy link
Collaborator

We have checked and updated more detailed instruction. In general, for the model with PLM, after having the “base" model, we load it and freeze the PLM encoder (simply add .detach() after encoder output). The final stage is fine-tuning the full model, remember to perform grid search to make sure it achieves best performance. In our experiment, we use this checkpoint for MixATIS and this checkpoint for MixSNIPS as base model. In the case of MixATIS, you could try learning rate 3e-5 (freezing) and 3e-6 (after freezing).
Hope it will help you. Should you have any further question, do not hesitate to contact me [email protected] where I more often check the inbox.

@TracyYannn
Copy link
Author

We have checked and updated more detailed instruction. In general, for the model with PLM, after having the “base" model, we load it and freeze the PLM encoder (simply add .detach() after encoder output). The final stage is fine-tuning the full model, remember to perform grid search to make sure it achieves best performance. In our experiment, we use this checkpoint for MixATIS and this checkpoint for MixSNIPS as base model. In the case of MixATIS, you could try learning rate 3e-5 (freezing) and 3e-6 (after freezing). Hope it will help you. Should you have any further question, do not hesitate to contact me [email protected] where I more often check the inbox.

Thank you for the update. May I ask on which graphics card the experiment was conducted?Thanks,happy everyday!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants