KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported #147

acombos · 2020-02-11T19:20:12Z

I have X as a sparse matrix and y as a pandas Series.

I then proceed with the following code:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
estim = HyperoptEstimator(classifier=any_sparse_classifier('clf'),
preprocessing=[],
algo=tpe.suggest,
max_evals=100,
trial_timeout=120)
estim.fit(X_train, y_train)

I got the following error:
Scikit-learn - ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

After that, I updated all my conda packages, and re-installed hyperopt sklearn. Now I get the following error:
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

Note that same thing happens when using passive_aggressive as well. Also note that when I run sklearn's PassiveAgressiveClassifier with the same train data, it works fine.

I have tested both my sparse matrix as well as my target values (y) for NaN, infinity or too large numbers. No such entries exist.

It's interesting to note that running the following code:
estim.fit(X,y)
, which contains all the data, runs normally without any problems.
So I checked X_train and y_train for NaN, infinity or too large numbers (in case something is wrong with sklearn's train_test_split), but again, everything seems fine.

acombos · 2020-02-11T20:21:39Z

Tried now the same thing.
Converted the y_test to numpy with the following line:
y_train = y_train.to_numpy()

Everything seems to work fine now.
I am not familiar with your procedures, please advice if you want me to close the Issue.

bjkomer · 2020-02-12T03:09:02Z

Unfortunately this project wasn't originally built with pandas in mind and doesn't explicitly support it. Now that sklearn has more support for pandas it definitely would be useful to add it here as well. In the meantime I could add some type checks and do the conversion inside of fit. That should hopefully help with most cases.

Related to #122

Side note: I believe the reason why it only doesn't work after calling train_test_split is because the indices get split across the two objects and the normal numpy way of accessing elements no longer works on a pandas Series like this. The default behaviour of calling reindex fills the missing indices with NaN, which would explain that first error.

bjkomer mentioned this issue Feb 12, 2020

converting X and y to numpy array in fit #149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported #147

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported #147

acombos commented Feb 11, 2020 •

edited

Loading

acombos commented Feb 11, 2020

bjkomer commented Feb 12, 2020

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported #147

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported #147

Comments

acombos commented Feb 11, 2020 • edited Loading

acombos commented Feb 11, 2020

bjkomer commented Feb 12, 2020

acombos commented Feb 11, 2020 •

edited

Loading