[Feature Request] Resume study and avoid OOM when optimizing with Optuna plugin #1679

dianemarquette · 2021-06-17T13:46:52Z

🚀 Feature Request

I would like to be able:

to persist my study in order to resume my hyperparameter search from where I left it
set the gc_after_trial parameter of Optuna's study.optimize()

Motivation

Is your feature request related to a problem? Please describe.
I'm always frustrated when my code crashes after 60 trials (out of 100). I suspect an OOM error. Being able to prevent the script from crashing in the first place with gc.collect() would be great. However, at least being able to resume my search from where it stopped would be a game changer.

Pitch

Describe the solution you'd like
I would like to set gc_after_trial to True and a path to store my study parameters after each trial in my Optuna sweeper hydra config.

Describe alternatives you've considered
I read Optuna's documentation but I'm not sure how to make their examples work with Hydra:

Are you willing to open a pull request? (See CONTRIBUTING)
I'm not comfortable enough with Optuna's and Hydra's library to prepare a pull request.

The text was updated successfully, but these errors were encountered:

omry · 2021-06-17T22:05:11Z

Hi @dianemarquette,
I am open to supporting it, although resume study might be harder than in seems (in general resume is not something supported by any Hydra Sweeeper right now).

In any case, we do not have the cycles for it, which means this will happen if someone from the community wants to work toward it.

Supporting gc_after_trial seems like it should be straight forward though.

dianemarquette · 2021-06-18T09:18:55Z

@omry Thanks for your quick reply. Any idea when gc_after_trial could be supported?

omry · 2021-06-18T23:18:26Z

It can be supported after sends a pull request to add support for it.
As I said, this is not a high pri. you can either wait and hope someone eventually do it, or alternatively you can try to do it yourself.

dianemarquette · 2021-06-21T07:44:13Z

Ok, thanks for the clarifcation :)

cgerum · 2022-03-25T17:29:55Z

@dianemarquette as of now resuming trials is somewhat supported by the optuna optimizer, by setting a storage backend:

hydra.sweeper.study_name=my_trial
hydra.sweeper.storage=sqlite:///my_trial.sqlite

But this will start the job numbering always from scratch and will therefore overwrite the output directories of individual jobs.

zhaoedf · 2022-08-02T03:14:19Z

@dianemarquette as of now resuming trials is somewhat supported by the optuna optimizer, by setting a storage backend:

hydra.sweeper.study_name=my_trial hydra.sweeper.storage=sqlite:///my_trial.sqlite

But this will start the job numbering always from scratch and will therefore overwrite the output directories of individual jobs.

what's more, it did be able to resume study but will inevitably launch multiple replicated runs for a specific params combination, since it will still run 80 times (let's assume for a grid search, there are 80 exps in total.) without launching those exps that have been successfully excecuted.

zhaoedf · 2022-08-02T03:16:24Z

and yes, gc collect is a feature i want too, cos now, no matter how i set the n_jobs or pre_dispatch params, the finished jobs will still exists and will exit until next group of parallel trials finish.

omry · 2022-09-14T18:39:38Z

Hydra has callbacks which can probably be used for it.
See this.

bablf · 2022-10-21T14:19:44Z

As far as I can tell we only need to add gc_after_trial to the correct config. I did that but when I run pytest I get two errors because the hydra/sweeper is not as expected.

You can find my code here. I do not have much experience with pytest that's why it's hard for me to debug the error.

Would be great if someone could help me 😅

Also I am not sure how to write a test that actually tests what I coded, since gc.collect() does not return anything. I managed to modify a test and added gc_after_trial and the config got build correctly. But we would need a test that actually loads a model with cuda right?
If we do not want to test it then the question becomes if we actually call the optuna implementation def _optimize() (see) and if it is enough to add the key to the config.

dianemarquette added the enhancement Enhanvement request label Jun 17, 2021

omry added help wanted Community help is wanted wishlist Low priority feature requests labels Jun 17, 2021

jieru-hu added the plugin Plugins realted issues label Sep 29, 2021

michelkok mentioned this issue Apr 13, 2023

[Feature Request] Resuming sweep #1407

Closed

michelkok linked a pull request Apr 21, 2023 that will close this issue

Optuna resume study #2647

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Resume study and avoid OOM when optimizing with Optuna plugin #1679

[Feature Request] Resume study and avoid OOM when optimizing with Optuna plugin #1679

dianemarquette commented Jun 17, 2021

omry commented Jun 17, 2021

dianemarquette commented Jun 18, 2021

omry commented Jun 18, 2021

dianemarquette commented Jun 21, 2021

cgerum commented Mar 25, 2022

zhaoedf commented Aug 2, 2022 •

edited

Loading

zhaoedf commented Aug 2, 2022

omry commented Sep 14, 2022 •

edited

Loading

bablf commented Oct 21, 2022 •

edited

Loading

[Feature Request] Resume study and avoid OOM when optimizing with Optuna plugin #1679

[Feature Request] Resume study and avoid OOM when optimizing with Optuna plugin #1679

Comments

dianemarquette commented Jun 17, 2021

🚀 Feature Request

Motivation

Pitch

omry commented Jun 17, 2021

dianemarquette commented Jun 18, 2021

omry commented Jun 18, 2021

dianemarquette commented Jun 21, 2021

cgerum commented Mar 25, 2022

zhaoedf commented Aug 2, 2022 • edited Loading

zhaoedf commented Aug 2, 2022

omry commented Sep 14, 2022 • edited Loading

bablf commented Oct 21, 2022 • edited Loading

zhaoedf commented Aug 2, 2022 •

edited

Loading

omry commented Sep 14, 2022 •

edited

Loading

bablf commented Oct 21, 2022 •

edited

Loading