Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid model overfitting #31

Open
BangLiu opened this issue Aug 8, 2018 · 9 comments
Open

Avoid model overfitting #31

BangLiu opened this issue Aug 8, 2018 · 9 comments

Comments

@BangLiu
Copy link

BangLiu commented Aug 8, 2018

I adapted this model to a text classification problem, where my text is concated as:
[start] text1 [delimiter] text2 [delimiter] text3 [classify]
and it is just a binary classification problem. So use F.softmax for the model output and use BCE loss.
I have 120,000 training examples, and 10,000 evaluation examples. n_ctx is set to be 500. For one epoch, it takes about 7 hours (1 GPU).
When I use lm_coef = 0.5, I found that the training accuracy on my training dataset is 0.9, but dev accuracy is just 0.66. More epochs doesn't improve the accuracy for evaluation dataset.
So this is exactly overfitting. I am looking for suggestions about what I can tune to make it not overfit, in either model of training settings?

@BangLiu BangLiu changed the title Prevent model overfit Prevent model overfitting Aug 8, 2018
@BangLiu BangLiu changed the title Prevent model overfitting Avoid model overfitting Aug 8, 2018
@BangLiu
Copy link
Author

BangLiu commented Aug 10, 2018

After fixed some data processing issue, my current performance is:
Epoch 1: train accuracy: 0.8593 dev accuracy: 0.7004, train_loss: 13.87, dev_loss: 20.79
Epoch 2: train accuracy: 0.9206 dev accuracy: 0.7157, train_loss: 10.63, dev_loss: 22.20
With more epochs, the dev_loss will keep growing, and accuracy is similar.

I found that the model has 116,531,713 trainable parameters. So I thought maybe the network is too big and remembers even 120,000 training examples. However, as the ROCStories only has less than 2000 examples, it doesn't get overfit. I don't know why my own data will get overfitting.

@teucer
Copy link

teucer commented Aug 23, 2018

Had the same issue with imdb sentiment analysis. Would appreciate some pointers here...

@rodgzilla
Copy link
Contributor

@teucer @BangLiu Have you tried even higher lm_coef?

If you want to reduce overfitting, you may also want to give an additional task to complete to the network (multi-task learning). This will give it something to do with its parameters.

@teucer
Copy link

teucer commented Aug 23, 2018

@rodgzilla ok will do that. What about increasing the dropout probability in the classification head? Would it help to increase it?

@dchatterjee172
Copy link

@teucer @BangLiu
Did you guys solve the problem?

@MrRobot2211
Copy link

Have you found any good way to regularize the network?

@Chinmay-Vadgama
Copy link

I have similar issue. I am training distilbert model after cleaning ISOT fake news dataset I am getting 99% validation accuracy after 1 epoch. It is predicting wrong labels on unseen data. I guess model is just remembering the input sequence and its clearly overfitting. So, How can I regularize it?

@shreeyashyende
Copy link

Add smoothing and dropouts

@pratikchhapolika
Copy link

Any answer on this. How to avoid overfitting on smaller data-points. Is dropout only option?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants