feature request: csvtk split by number of lines per chunk #122

avilella · 2021-01-19T16:40:29Z

This is a feature request for the csvtk split command to have and additional --nlines option so that it behaves similarly to the GNU utils split --lines (https://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html) but deals with the headers in a nice way.

E.g. we have a file with 5 entries:
a,b,c,d
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8

We run csvtk split --nlines 2, which produces chunks of 2 entries per line:
##file1
a,b,c,d
1,2,3,4
2,3,4,5
##file2
a,b,c,d
3,4,5,6
4,5,6,7
##file3
a,b,c,d
5,6,7,8

Thanks in advance

The text was updated successfully, but these errors were encountered:

san-r · 2021-07-25T09:36:33Z

I need to use this feature when working with very large csv files, which I usually keep compressed with gzip or zstd (which supports significantly faster decompression speed). For the moment, I use xsv from https://github.com/BurntSushi/xsv which does exactly what has been asked above. However, it outputs uncompressed csv chunks only. I haven't figured out a way to output chunks compressed with gzip or zstd. This feature would be a very useful addition to csvtk.

shenwei356 added the new feature label Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: csvtk split by number of lines per chunk #122

feature request: csvtk split by number of lines per chunk #122

avilella commented Jan 19, 2021

san-r commented Jul 25, 2021

feature request: csvtk split by number of lines per chunk #122

feature request: csvtk split by number of lines per chunk #122

Comments

avilella commented Jan 19, 2021

san-r commented Jul 25, 2021