Fix simple impute #788

abduhbm · 2021-02-04T17:04:29Z

Fix #787.
Also related to #779

abduhbm · 2021-02-04T23:12:01Z

CI failing for linting issues

TomAugspurger

Thanks. A few questions / comments.

TomAugspurger · 2021-02-28T13:27:21Z

dask_ml/impute.py

            avg = X.mean(axis=0).values
        elif self.strategy == "median":
-            avg = X.quantile().values
+            avg = [np.median(X[col].dropna()) for col in X.columns]


I believe this will eagerly compute the values, thanks to np.median. Since that's done in a list comprehension, we'd end up executing the graph for X once per column. We want to delay computation till the end.

I also think this will end up pulling all the data for a column into a single ndarray, to do the median, which we also want to avoid.

How about using delayed here?

avg = [dask.delayed(np.median(X[col].dropna())) for col in X.columns]

TomAugspurger · 2021-02-28T13:27:47Z

dask_ml/impute.py

+            for col in X.columns:
+                val_counts = X[col].value_counts().reset_index()
+                if isinstance(X, dd.DataFrame):
+                    x = val_counts.to_dask_array(lengths=True)


Do we need lengths here? This also triggers a computation.

This is needed to compute chunk sizes ... any suggestion on how to avoid it? Thanks,

abduhbm added 4 commits February 4, 2021 19:28

Fix median and most_frequent strategies in SimpleImpute._fit_frame

a8c228c

Lint

3c2831c

compat

a92bfd5

Fix compat for finding smallest most_frequent

ffaeb80

Merge branch 'main' of github.com:dask/dask-ml into fix-simple-impute

b15ef37

TomAugspurger reviewed Feb 28, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix simple impute #788

Fix simple impute #788

Uh oh!

abduhbm commented Feb 4, 2021

Uh oh!

abduhbm commented Feb 4, 2021

Uh oh!

TomAugspurger left a comment

Uh oh!

TomAugspurger Feb 28, 2021

Uh oh!

abduhbm Mar 25, 2021

Uh oh!

TomAugspurger Feb 28, 2021

Uh oh!

abduhbm Mar 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix simple impute #788

Are you sure you want to change the base?

Fix simple impute #788

Uh oh!

Conversation

abduhbm commented Feb 4, 2021

Uh oh!

abduhbm commented Feb 4, 2021

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Feb 28, 2021

Choose a reason for hiding this comment

Uh oh!

abduhbm Mar 25, 2021

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Feb 28, 2021

Choose a reason for hiding this comment

Uh oh!

abduhbm Mar 25, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants