Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min_epochs and EarlyStopping in conflict #19966

Open
timlod opened this issue Jun 10, 2024 · 1 comment
Open

min_epochs and EarlyStopping in conflict #19966

timlod opened this issue Jun 10, 2024 · 1 comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers

Comments

@timlod
Copy link

timlod commented Jun 10, 2024

Bug description

I have a problem where I use min_epochs because it can take a while before the training starts to converge.
EarlyStopping is triggered quite early, but I thought to set min_epochs appropriately to 'get over' that initial period.
However, even though training is converging by the time we reach min_epochs, early stopping will stop training immediately once we reached min_epochs, just because it was triggered very early on in training.

I think that EarlyStopping should pick itself back up if we improve upon the monitored metric before reaching min_epochs.

Example Trainer config:

trainer = L.Trainer(
        max_epochs=10000,
        callbacks=[
            EarlyStopping(monitor="val_loss", mode="min", patience=100),
        ]
        min_epochs=1000,
    )

Now imagine EarlyStopping triggering at epoch 100, but val_loss improving at 101 all the way until epoch 1000 - right now training will still stop.

What version are you seeing the problem on?

v2.2

How to reproduce the bug

No response

Error messages and logs

No response

Environment

No response

More info

No response

@timlod timlod added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jun 10, 2024
@shirondru
Copy link

I also see this and think the implementation would be better suited if, after min_epochs is reached, EarlyStopping takes precedence. As it stands right now, it is as if EarlyStopping does not exist because training exits once min_epochs is reached no matter what.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants