Avoid model overfitting #31

BangLiu · 2018-08-08T03:31:22Z

I adapted this model to a text classification problem, where my text is concated as:
[start] text1 [delimiter] text2 [delimiter] text3 [classify]
and it is just a binary classification problem. So use F.softmax for the model output and use BCE loss.
I have 120,000 training examples, and 10,000 evaluation examples. n_ctx is set to be 500. For one epoch, it takes about 7 hours (1 GPU).
When I use lm_coef = 0.5, I found that the training accuracy on my training dataset is 0.9, but dev accuracy is just 0.66. More epochs doesn't improve the accuracy for evaluation dataset.
So this is exactly overfitting. I am looking for suggestions about what I can tune to make it not overfit, in either model of training settings?

BangLiu · 2018-08-10T02:54:39Z

After fixed some data processing issue, my current performance is:
Epoch 1: train accuracy: 0.8593 dev accuracy: 0.7004, train_loss: 13.87, dev_loss: 20.79
Epoch 2: train accuracy: 0.9206 dev accuracy: 0.7157, train_loss: 10.63, dev_loss: 22.20
With more epochs, the dev_loss will keep growing, and accuracy is similar.

I found that the model has 116,531,713 trainable parameters. So I thought maybe the network is too big and remembers even 120,000 training examples. However, as the ROCStories only has less than 2000 examples, it doesn't get overfit. I don't know why my own data will get overfitting.

teucer · 2018-08-23T08:06:12Z

Had the same issue with imdb sentiment analysis. Would appreciate some pointers here...

rodgzilla · 2018-08-23T10:24:23Z

@teucer @BangLiu Have you tried even higher lm_coef?

If you want to reduce overfitting, you may also want to give an additional task to complete to the network (multi-task learning). This will give it something to do with its parameters.

teucer · 2018-08-23T11:20:27Z

@rodgzilla ok will do that. What about increasing the dropout probability in the classification head? Would it help to increase it?

dchatterjee172 · 2019-01-26T03:32:13Z

@teucer @BangLiu
Did you guys solve the problem?

MrRobot2211 · 2019-11-06T18:18:23Z

Have you found any good way to regularize the network?

Chinmay-Vadgama · 2020-05-12T22:10:03Z

I have similar issue. I am training distilbert model after cleaning ISOT fake news dataset I am getting 99% validation accuracy after 1 epoch. It is predicting wrong labels on unseen data. I guess model is just remembering the input sequence and its clearly overfitting. So, How can I regularize it?

shreeyashyende · 2021-04-17T05:05:01Z

Add smoothing and dropouts

pratikchhapolika · 2022-03-02T08:05:51Z

Any answer on this. How to avoid overfitting on smaller data-points. Is dropout only option?

BangLiu changed the title ~~Prevent model overfit~~ Prevent model overfitting Aug 8, 2018

BangLiu changed the title ~~Prevent model overfitting~~ Avoid model overfitting Aug 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid model overfitting #31

Avoid model overfitting #31

BangLiu commented Aug 8, 2018 •

edited

Loading

BangLiu commented Aug 10, 2018

teucer commented Aug 23, 2018

rodgzilla commented Aug 23, 2018

teucer commented Aug 23, 2018

dchatterjee172 commented Jan 26, 2019

MrRobot2211 commented Nov 6, 2019

Chinmay-Vadgama commented May 12, 2020

shreeyashyende commented Apr 17, 2021

pratikchhapolika commented Mar 2, 2022

Avoid model overfitting #31

Avoid model overfitting #31

Comments

BangLiu commented Aug 8, 2018 • edited Loading

BangLiu commented Aug 10, 2018

teucer commented Aug 23, 2018

rodgzilla commented Aug 23, 2018

teucer commented Aug 23, 2018

dchatterjee172 commented Jan 26, 2019

MrRobot2211 commented Nov 6, 2019

Chinmay-Vadgama commented May 12, 2020

shreeyashyende commented Apr 17, 2021

pratikchhapolika commented Mar 2, 2022

BangLiu commented Aug 8, 2018 •

edited

Loading