Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPU Dask DataFrame support for DistributedDataClassifier #194

Open
sarahyurick opened this issue Aug 8, 2024 · 3 comments
Open

Add CPU Dask DataFrame support for DistributedDataClassifier #194

sarahyurick opened this issue Aug 8, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@sarahyurick
Copy link
Collaborator

sarahyurick commented Aug 8, 2024

Currently, when trying out this notebook with a CPU Dask DataFrame, it fails with a TypeError: batch_text_or_text_pairs has to be a list or a tuple (got <class 'pandas.core.series.Series'>).

To reproduce, use the linked notebook, add

import pandas as pd
import dask.dataframe as dd

and replace

df = cudf.DataFrame({"text": text})
input_dataset = DocumentDataset(dask_cudf.from_cudf(df, npartitions=1))

with

input_dataset = DocumentDataset(dd.from_pandas(pd.DataFrame({"text": text}), npartitions=1))

I will start scoping this bug, as it is also related to #79.

cc @ayushdg @ryantwolf @VibhuJawa

@sarahyurick sarahyurick added the enhancement New feature or request label Aug 8, 2024
@sarahyurick sarahyurick self-assigned this Aug 8, 2024
@sarahyurick sarahyurick added bug Something isn't working and removed enhancement New feature or request labels Aug 8, 2024
@sarahyurick
Copy link
Collaborator Author

rapidsai/crossfit#76 adds support for CPU Dask DataFrames, as long as you're working on a machine with GPUs available...

For a machine without GPUs available, we can't use CrossFit. I think we can still do a non-CrossFit implementation similar to what we used to have, though. I will continue working on this and see how it goes.

@VibhuJawa
Copy link
Collaborator

For a machine without GPUs available, we can't use CrossFit. I think we can still do a non-CrossFit implementation similar to what we used to have, though. I will continue working on this and see how it goes.

I am not sure if this is a great use of our time right now because dont think we should spend time exploring Deep Learning models on CPU .

@sarahyurick
Copy link
Collaborator Author

I am not sure if this is a great use of our time right now because dont think we should spend time exploring Deep Learning models on CPU .

Ok, can definitely put this on the backburner for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants