We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimpleImputer.fit
median
most_frequent
What happened: SimpleImputer.fit with median and most_frequent strategies on frames compute different results comparing to scikit-learn.
scikit-learn
What you expected to happen: They should have consistent results with sklearn.impute.SimpleImputer.
sklearn.impute.SimpleImputer
Minimal Complete Verifiable Example:
df = pd.DataFrame({"A": [1, 1, np.nan, np.nan, 2, 2]}) # This should return the smallest value b = dask_ml.impute.SimpleImputer(strategy="most_frequent", fill_value=None) b.fit(df) b.statistics_ >>> A 2.0 >>> dtype: float64 c = sklearn.impute.SimpleImputer(strategy="most_frequent", fill_value=None) c.fit(df) c.statistics_ >>> array([1.])
With median:
df = pd.DataFrame({"A": [1, 1, np.nan, np.nan, 2, 2]}) df = dd.from_pandas(df, 2) b = dask_ml.impute.SimpleImputer(strategy="median", fill_value=None) b.fit(df) b.statistics_ >>> A 1.0 >>> dtype: float64 c = sklearn.impute.SimpleImputer(strategy="median", fill_value=None) c.fit(df) c.statistics_ >>> array([1.5])
Environment:
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
What happened:
SimpleImputer.fit
withmedian
andmost_frequent
strategies on frames compute different results comparing toscikit-learn
.What you expected to happen:
They should have consistent results with
sklearn.impute.SimpleImputer
.Minimal Complete Verifiable Example:
With
median
:Environment:
The text was updated successfully, but these errors were encountered: