You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The read_table function lacks some options the other read_delim-based functions have, most importantly for me the id option. I have many whitespace-delimited files (the kind read_table is meant to handle) and I want to load them all with a map call followed with bind_rows, but the lack of id option makes it hard to keep track of which data comes from which input file.
Moreover, it makes the read_table function inconsistent with other read functions, for no obvious reason.
The text was updated successfully, but these errors were encountered:
Recently, I came up with a decent workaround for getting the result I want, so I'm documenting it here, and contrasting it with other read_* functions.
csv, tsv, fwf, and other "nice" formats (expected behaviour)
What I want can be done directly with these formats:
format pretty-printed with multiple spaces between columns (actual hoops necessary to get there)
On the other hand, with read_table one needs to do something like this:
my_files<- list.files("my_data", pattern="*.txt")
my_df<-my_files %>%
map(read_table, comment="#") %>%
bind_rows(.id="filename") %>% # this fills a column with numbers per file, but the type is char
mutate(filename=my_files[as.numeric(filename)])
another option - miller
The same can be also achieved from command-line with miller. Miller is like a tidyverse for Unix terminal, giving a 21st-century facelift to those old rusty tools like cut, sort, or awk.
One of miller's strengths is format-awareness and seamless conversions. It can directly read the format discussed here with --ipprint and convert to e.g. csv with --ocsv, or, as I do below, with a keystroke-saver --p2c:
The created csv can obviously be loaded into R directly and I can continue with my work.
Miller works fine, unless I need to rename many columns in the individual files before concatenating them into one table. Miller has similarfacilities comparable to dplyr::rename, but anything more complex is better served with magrittr::set_colnames, where I can do things like:
The
read_table
function lacks some options the otherread_delim
-based functions have, most importantly for me theid
option. I have many whitespace-delimited files (the kindread_table
is meant to handle) and I want to load them all with amap
call followed withbind_rows
, but the lack ofid
option makes it hard to keep track of which data comes from which input file.Moreover, it makes the
read_table
function inconsistent with other read functions, for no obvious reason.The text was updated successfully, but these errors were encountered: