Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best approach for custom serializers? #61

Open
jonkeane opened this issue Feb 20, 2020 · 0 comments
Open

Best approach for custom serializers? #61

jonkeane opened this issue Feb 20, 2020 · 0 comments

Comments

@jonkeane
Copy link
Collaborator

jonkeane commented Feb 20, 2020

dputs of objects were chosen as the first serialization for a few reasons:

  1. they are plain text so easily reviewable and understandable in git diffs
  2. they serialize any sort of object
  3. they can be used to reliably return a data.frame with specific column types

A few alternatives were not chosen:

CSV
While these are plain text, and arguably easier to read than dput output, they would need some sort of sidecar file to make sure they are parsed correctly into data.frames and they couldn't be used to serialize non-data.frame objects. (missing 2 and 3 above)

RDS
These can serialize anything (and reliably return data.frames but they don't satisfy (1) above since they are binary and not plain text.

For most objects the dput output is probably just fine, though for the result of large queries, we might want something that is easier to read and reason about (and ideally would behave better than writing and reading dput. One possible alternative serialization would be CSVY (e.g. https://cran.r-project.org/web/packages/csvy/index.html) but that depends on data.table which is a rather hefty dependency for serialization alone.

It should also be pointed out that the limitations of dput objects have a side effect of encouraging best practices when writing and using fixtures: one's fixture ought to be as minimal as possible to test what you need. dput objects work well (enough) for small objects and only start to fall down when there are large numbers of rows/columns.

There are a few options:

  • Suggest CSVY and optionally use it
  • Build functionality for people to provide their own, custom serializers for data.frame returning queries (similar to how httptest allows for custom redactors)
  • Leave everything as is
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant