Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better performance than the reported. #100

Open
shizhediao opened this issue Aug 2, 2020 · 1 comment
Open

Better performance than the reported. #100

shizhediao opened this issue Aug 2, 2020 · 1 comment

Comments

@shizhediao
Copy link

shizhediao commented Aug 2, 2020

Hi,
I am reproducing the fine-tune results following your instruction.
I work on your default code and my setting is as below. I work on ebmnlp and pico task without fine-tuning

DATASET='ebmnlp', TASK='pico', with_finetuning='' #'_finetune' # or '' for not fine tuning, dataset_size=38124,

However, I found my results are much better than the reported number in your paper. In scibert paper, the micro F1 is 68.30 (Frozen) and 72.28 (fine-tune), but in my experiment, the F1 is 75.48 (frozen).

"test_accuracy": 0.8668599033816425,
"test_accuracy3": 0.9632463768115942,
"test_F1_O": 0.9153885841369629,
"test_F1_I-OUT": 0.673623263835907,
"test_F1_I-PAR": 0.7656552195549011,
"test_F1_I-INT": 0.6646913886070251,
"test_avg_f1": 0.754839614033699,
"test_loss": 4.438140077646389

I am really confused.
Thanks

@shizhediao
Copy link
Author

Thanks to @kyleclo , this question has been solved.
This is because I use the wrong metrics. The correct one should be (test_F1_I-OUT+test_F1_I-PAR+test_F1_I-INT)/3 = 0.7013232907.
I am curious why it is still higher than the reported number, which is 68.30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant