Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal sorting of parameter keys leads to unexpected behaviour when registering an array #530

Open
till-m opened this issue Oct 22, 2024 · 13 comments

Comments

@till-m
Copy link
Member

till-m commented Oct 22, 2024

Describe the bug
Internally, the target space sorts the keys of the pbounds dict before creating the target space bounds. This is presumably because python dicts were unordered before python 3.7 and ensured that there was a canonical order to the parameters. However, after python 3.7 dict keys are now ordered, making this sorting unnecessary. Moreover, the fact that the internal representation is changed can lead to unexpected behaviour when registering a numpy array (see example below). This seems very likely to happen when using a DataFrame representation of data.

To Reproduce

Ex:

from sklearn import datasets
import pandas as pd
from bayes_opt import BayesianOptimization

df = pd.DataFrame(datasets.load_iris().data, columns=datasets.load_iris().feature_names)
df.columns = [col.replace(' (cm)', '') for col in df.columns]
df.columns = [col.replace(' ', '_') for col in df.columns]

pbounds = {
    'sepal_length': (df['sepal_length'].min(), df['sepal_length'].max()),
    'sepal_width': (df['sepal_width'].min(), df['sepal_width'].max()),
    'petal_length': (df['petal_length'].min(), df['petal_length'].max()),
    'petal_width': (df['petal_width'].min(), df['petal_width'].max())
}
optimizer = BayesianOptimization(
    f=None,
    pbounds=pbounds,
    random_state=1,
)
optimizer.register(df.iloc[0].values, 0)
print(dict(optimizer.space.array_to_params(optimizer.space.params[0])))
print(dict(df.iloc[0]))

Output:

{'petal_length': np.float64(5.1), 'petal_width': np.float64(3.5), 'sepal_length': np.float64(1.4), 'sepal_width': np.float64(0.2)}
{'sepal_length': np.float64(5.1), 'sepal_width': np.float64(3.5), 'petal_length': np.float64(1.4), 'petal_width': np.float64(0.2)}

Expected behavior
The optimizer should not change the order of the keys provided by the user. Registering a numpy array instead of a dict should assume the same order as given by the pbounds.

@Nkehoe-QUB
Copy link

Nkehoe-QUB commented Oct 25, 2024

Hello, I believe this bug is affecting my usage of the package. I have the following instance of the Optimisation class and after the first iteration is seems to swap the second and third bounds, so the parameter 'thick' is meant to have bound of 0.1e-6 to 15.0e-6 and 'g' is meant to have 1.0 to 2.0 but they get swapped.

# Simulation Parameters min and max values
param_Min = [1.0, 0.1 * micron, 1.0]
param_Max = [20.0, 15.0 * micron, 2.0]

# Bayesian Optimization Class, random_state is the seed for the random number generator
acq = acquisition.UpperConfidenceBound(kappa=5.0)

optimizer = BayesianOptimization(
    f=None,
    acquisition_function=acq,
    pbounds={'den' : (param_Min[0], param_Max[0]),
             'thick' : (param_Min[1], param_Max[1]),
             'g' : (param_Min[2], param_Max[2])
             },
    verbose=2,
    random_state=1,
)

@till-m
Copy link
Member Author

till-m commented Oct 25, 2024

Hey @Nkehoe-QUB, if you give me a full example that reproduces the error I can have a look and see if this is really what happens.

@Nkehoe-QUB
Copy link

Hi, I'm sorry it turns out it's an issue how I'm handing the output of optimiser.suggest() and passing it to my simulation. I was doing:

next_points = optimizer.suggest()
    ID = str(Run_PIC(i, j, [values for values in next_points.values()]))

But I didn't realise that the keys of the suggest() output would be in alphabetical order so therefore it was passing it as den g and then thick.

@perezed00
Copy link
Contributor

I'm glad this has been noted. I agree that this is a wrinkle worth smoothing out. If nothing else, better compatibility with pandas seems worthwhile. I'll take a look and report back.

@bwheelz36
Copy link
Collaborator

bwheelz36 commented Nov 16, 2024

I do remember running into this and being tripped up by it. (example below 😝)

I agree it would be better if this didn't happen, but note that this would be a breaking change for many users of this library, who will be assuming that this library is sorting the parameters. so fixing this should be deferred until the next major release is planned in my opinion.

image

@till-m
Copy link
Member Author

till-m commented Nov 17, 2024

This should only affect people that register arrays, which hopefully is a small part of the userbase. One thing we could do is publish a minor version that raises a warning when someone registers an array, telling them that this operation will work differently in a future release.

@anates
Copy link

anates commented Dec 18, 2024

This change added confusing warnings even when executing the example code given on the web page, e.g. on https://bayesian-optimization.github.io/BayesianOptimization/2.0.1/visualization.html. Would it be possible to add a short hint about that also there, as it might be confusing for future users who want to follow the tutorial, but receive the warning?

@till-m
Copy link
Member Author

till-m commented Dec 18, 2024

Hey @anates,

Thanks for reporting. The warning should definitely not show up in this context, this is due to an error on my part. Let me think about the best way to handle the issue.

For the record: This happens because space.probe actually converts to an array before registering, I thought it would just pass the point as received. So now running the maximize loop will actually trigger the warning, which really shouldn't happen and is understandably extremely confusing for any user.

Apologies for the confusion & hi from Basel :)

@anates
Copy link

anates commented Dec 18, 2024

@till-m For my understanding: Is there currently any way to avoid rotations/exchanges of the keys in pbounds? Or what is the best way to approach that problem until a further fix?
Thank you, and regards back from Thun!

@till-m
Copy link
Member Author

till-m commented Dec 18, 2024

Hey @anates,

how are you using the library currently?

  • If you just run .maximize -- no need to worry. The warning gets triggered unnecessarily.
  • If you use suggest-evaluate-register, but you register points as dictionaries (.suggest returns a dictionary, so unless you manually transform the point, this will be the case), you don't need to worry either. The warning gets triggered unnecessarily.
  • If you register a point in an array-format at some point (i.e using either .register or .probe manually, with an array-valued argument), then this warning would be for you.

The plan is to fix the erroneously displayed warnings. The fact that the order is manipulated by the optimizer is something we will change in an upcoming release.

What is the problem?

Consider the case where you're trying to optimize a function

def foobar(baz, qux, quux):
    return something(baz, qux, quux)

and you have evaluated the function at foobar(baz=0., qux=1., quux=2.)=42. If you register this information with the optimizer, you can do it in two ways:

optimizer.register({'baz': 0., 'qux': 1., 'quux': 2.}, target=42) # register a dict
optimizer.register(np.array([0., 1., 2.]), target=42) # register an array

The optimizer assumes that if you register an array, the entries correspond to the alphabetically ordered parameters, i.e. it would consider the second option to be equivalent to optimizer.register({'baz': 0., 'quux': 1., 'qux': 2.}, target=42). So if you want to register an array, you need to ensure that the entries in the array are in alphabetical order of parameter names. Or, much simpler, just convert it to a dictionary and don't worry about it.

I hope this helps. Let me know if there are any other questions.

@anates
Copy link

anates commented Dec 18, 2024

Hei @till-m, currently I'm just using .maximize(), thus I was confused how to fix that warning.
Thank you for the explanation!

@till-m
Copy link
Member Author

till-m commented Dec 18, 2024

@anates if you install from PyPI, an updated version of the package should be available that doesn't produce the warnings. On conda it might take some time for the release to show up.

@anates
Copy link

anates commented Dec 18, 2024

@till-m Perfect, thank you very much for the quick reply and fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants