-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OptunaSearchCV does not allow multiple fit calls if using a predefined study #118
Comments
Hmm, so you think this class should behave the same as sklearn's GridSearchCV. As you described, we have a simple workaround to do so and I'm not sure this class should have exact same behaviour as the sklearn's one because it depends on |
I understand that optuna's learning process allows for incremental data input. However, this changes completely the semantics of scikit-learn's As an example, think of a call to scikit-learn's The solution is quite simple. On every call to |
Thank you for clarification. Alternatively, passing a new study to |
My concern on your suggestion is the storage. I suppose the approach works only with the default storage: in memory, because a study instance has storage info. So another rule or argument is necessary to create a new study when calling fit method. |
This has exactly the issue I described before. Passing a Everything can be solved easily, including the storage issue you mentioned before. Basically, instead of using the optuna-integration/optuna_integration/sklearn/sklearn.py Lines 886 to 893 in 15e6b0e
To this: else:
prefix_name = self.study.study_name
i_fit = 0
for t_study in self.study._storage.get_all_studies():
if re.fullmatch(f"{prefix_name}_fit[0-9]+", t_study.study_name) is not None:
i_fit += 1
self.study_ = study_module.create_study(
direction="maximize",
sampler=self.study.sampler,
pruner=self.study.pruner,
study_name=f"{prefix_name}_fit{i_fit}",
storage=self.study._storage,
load_if_exists=False,
) This creates one entry in the storage each time the |
Expected behavior
When CV is used to evaluate a model's performance, it requires fitting the same model several times with different training datasets. Like GridSearchCV, OptunaSearchCV should find the best set of hyperparameters on each
fit
call, independently from previousfit
calls. In a nutshell, in scikit-learn, callingfit
should overwrite what has been learned in the previous fit.If we define a
study
and use it in theOptunaSearchCV
object, each call to fit will still consider previously tested hyperparameters.Running this code:
I can get this output:
We can see that after the first 10 trials, when the
fit
method is called again, we still consider trial 0 as the best.However, this is not the case when the
study
parameter in theOptunaSearchCV
is leftNone
:Environment
Error messages, stack traces, or logs
Pasted in the description
Steps to reproduce
Additional context (optional)
No response
The text was updated successfully, but these errors were encountered: