Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Issue with train_test_split and blockwise #999

Open
christhorn2 opened this issue Aug 15, 2024 · 1 comment
Open

Documentation Issue with train_test_split and blockwise #999

christhorn2 opened this issue Aug 15, 2024 · 1 comment

Comments

@christhorn2
Copy link

christhorn2 commented Aug 15, 2024

Describe the issue:

API Documentation of dask train_test_split states that blockwise=False is supported for Arrays:
"For Dask Arrays, set blockwise=False to shuffle data between blocks as well."
https://ml.dask.org/modules/generated/dask_ml.model_selection.train_test_split.html#dask_ml.model_selection.train_test_split

This is the intention of the code too I think, and it delegates the job to ShuffleSplit:

elif all(isinstance(arr, da.Array) for arr in arrays):

However, ShuffleSplit does not support blockwise=False:

def _split(self, X):

Minimal Complete Verifiable Example:

from dask_ml.model_selection import train_test_split
import dask.array as da
x = da.arange(8, chunks=4)
train_test_split(x,blockwise=false)
....
NotImplementedError: ShuffleSplit with blockwise=False has not been implemented yet.

Environment:

  • Dask version: 2024.4.4
  • Python version: 3.9.18
  • Operating System:
  • Install method (conda, pip, source): pip
@christhorn2 christhorn2 changed the title Documentation Issue Documentation Issue with train_test_split and blockwise Aug 15, 2024
@narnia24
Copy link

hey @christhorn2 , can i work on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants