Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: csvtk split by number of lines per chunk #122

Open
avilella opened this issue Jan 19, 2021 · 1 comment
Open

feature request: csvtk split by number of lines per chunk #122

avilella opened this issue Jan 19, 2021 · 1 comment

Comments

@avilella
Copy link

This is a feature request for the csvtk split command to have and additional --nlines option so that it behaves similarly to the GNU utils split --lines (https://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html) but deals with the headers in a nice way.

E.g. we have a file with 5 entries:
a,b,c,d
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8

We run csvtk split --nlines 2, which produces chunks of 2 entries per line:
##file1
a,b,c,d
1,2,3,4
2,3,4,5
##file2
a,b,c,d
3,4,5,6
4,5,6,7
##file3
a,b,c,d
5,6,7,8

Thanks in advance

@san-r
Copy link

san-r commented Jul 25, 2021

I need to use this feature when working with very large csv files, which I usually keep compressed with gzip or zstd (which supports significantly faster decompression speed). For the moment, I use xsv from https://github.com/BurntSushi/xsv which does exactly what has been asked above. However, it outputs uncompressed csv chunks only. I haven't figured out a way to output chunks compressed with gzip or zstd. This feature would be a very useful addition to csvtk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants