Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better way to read alot of csv files #3

Open
riedel opened this issue Oct 26, 2020 · 0 comments
Open

Better way to read alot of csv files #3

riedel opened this issue Oct 26, 2020 · 0 comments

Comments

@riedel
Copy link
Member

riedel commented Oct 26, 2020

Currently I am using a bag that reduces to a local data frame. See my SO question/answer https://stackoverflow.com/questions/64512040/how-to-aggregate-large-number-of-small-csv-files-50k-efficiently-code-size/64517641

With a partitioning strategy it should be possible to build a distributed data frame (needed if the data is not that heavily reduced)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant