Turn `autoread()` into a wrapper for `rio::import()` #1

bokov · 2019-09-18T14:41:31Z

rio::import() does what I've been trying to do but better. I can switch to that, but add something they don't do: empirical format detection. Basically, file extensions are not a reliable indicator of format. Some extensions are ambiguous or missing altogether. Some users/apps create files with incorrect extensions. Even magic number snooping isn't foolproof-- for example, ODS and XLSX have the same magic numbers. The true way to determine if a file has a particular format is to try it and see what happens-- or as a Python programmer might say "ask for forgiveness instead of for permission".

The wrapper I want to write (or the rio team if they like the idea and beat me to it) will do some input validation/adjustment and then try to run rio::import() taking the format value from the file as per its default. But then, if it fails, there will be a vector of supported formats (ideally sorted by how commonly used they are) and we will iterate over this vector, calling rio::import() again using each format in turn until it either succeeds or all formats have been tried. So yes, this is a brute force approach, but mis-specified read attempts tend to fail before they spend a lot of time reading the file, so it doesn't take that long to converge on the right format by trial and error. In use cases where you're mass-processing a collection of files or trying to build an analysis pipeline where you don't know the file format in advance, this tradeoff is worth it.

I might have the function either use message() or an attribute to record what format turned out to be correct, so the user will know for next time.

First, though, we need more standardization in the arguments. @leeper already does this for the which parameter, and someday I'd like to contribute similar mapping for other general concepts in reading tabular data including skip, nrows, na. But until then, we do need to at least have the underlying read functions not get passed arguments that will make them error. I believe I have a pretty clean solution to this in my pull request. If things go well with that, I'll submit similar PRs for other common readers where this happens, and then I'll be ready to replace my autoread() with rio::import().

Note to self: one student has an .rda file as input, so once I resolve this ticket, I should notify them.

The text was updated successfully, but these errors were encountered:

bokov added enhancement New feature or request blocked Waiting for something outside the project labels Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn `autoread()` into a wrapper for `rio::import()` #1

Turn `autoread()` into a wrapper for `rio::import()` #1

bokov commented Sep 18, 2019 •

edited

Loading

Turn autoread() into a wrapper for rio::import() #1

Turn autoread() into a wrapper for rio::import() #1

Comments

bokov commented Sep 18, 2019 • edited Loading

Turn `autoread()` into a wrapper for `rio::import()` #1

Turn `autoread()` into a wrapper for `rio::import()` #1

bokov commented Sep 18, 2019 •

edited

Loading