You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have X as a sparse matrix and y as a pandas Series.
I then proceed with the following code:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
estim = HyperoptEstimator(classifier=any_sparse_classifier('clf'),
preprocessing=[],
algo=tpe.suggest,
max_evals=100,
trial_timeout=120)
estim.fit(X_train, y_train)
I got the following error:
Scikit-learn - ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
Note that same thing happens when using passive_aggressive as well. Also note that when I run sklearn's PassiveAgressiveClassifier with the same train data, it works fine.
I have tested both my sparse matrix as well as my target values (y) for NaN, infinity or too large numbers. No such entries exist.
It's interesting to note that running the following code:
estim.fit(X,y)
, which contains all the data, runs normally without any problems.
So I checked X_train and y_train for NaN, infinity or too large numbers (in case something is wrong with sklearn's train_test_split), but again, everything seems fine.
The text was updated successfully, but these errors were encountered:
Unfortunately this project wasn't originally built with pandas in mind and doesn't explicitly support it. Now that sklearn has more support for pandas it definitely would be useful to add it here as well. In the meantime I could add some type checks and do the conversion inside of fit. That should hopefully help with most cases.
Side note: I believe the reason why it only doesn't work after calling train_test_split is because the indices get split across the two objects and the normal numpy way of accessing elements no longer works on a pandas Series like this. The default behaviour of calling reindex fills the missing indices with NaN, which would explain that first error.
I have X as a sparse matrix and y as a pandas Series.
I then proceed with the following code:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
estim = HyperoptEstimator(classifier=any_sparse_classifier('clf'),
preprocessing=[],
algo=tpe.suggest,
max_evals=100,
trial_timeout=120)
estim.fit(X_train, y_train)
I got the following error:
Scikit-learn - ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
After that, I updated all my conda packages, and re-installed hyperopt sklearn. Now I get the following error:
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'
Note that same thing happens when using passive_aggressive as well. Also note that when I run sklearn's PassiveAgressiveClassifier with the same train data, it works fine.
I have tested both my sparse matrix as well as my target values (y) for NaN, infinity or too large numbers. No such entries exist.
It's interesting to note that running the following code:
estim.fit(X,y)
, which contains all the data, runs normally without any problems.
So I checked X_train and y_train for NaN, infinity or too large numbers (in case something is wrong with sklearn's train_test_split), but again, everything seems fine.
The text was updated successfully, but these errors were encountered: