-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naive implementation of chunking reader #288
base: master
Are you sure you want to change the base?
Conversation
Possible steps to solve this problem:
|
90f3b5d
to
08bc67b
Compare
Per the hackathon discussion, step 1 is already handled; step 2 would probably be a good idea too. 3 and 4 to follow. Additionally, for step 4, we should probably provide a utility function (or just a slightly different kwarg to |
08bc67b
to
d748ea5
Compare
Import BlockIO/ChunkIter from Dagger Wire blocking into loadtable
d748ea5
to
b3dffb1
Compare
I almost forgot, I still need to actually implement incremental saving of read blocks to the output file when specified, otherwise we'll still read the whole CSV's data into memory before serializing back out. |
Quick update for onlookers: the latest commit attempts to split individual files into blocks before calling |
f9c3bfb
to
9c6d5dd
Compare
Bump, anyone up for reviewing this? |
# Break file into blocks of size `blocksize` or less | ||
fsize = filesize(file) | ||
nblocks = max(div(fsize, blocksize), 1) | ||
bios = blocks(file, '\n', nblocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, I love that '\n'
feature.
Looks like some change in TextParse 1.0 is breaking the ability to pass EDIT: |
Replaces #129
TODO:
loadtable