Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug to bug compatibility with readr::type_convert(), but is it correct? #22

Open
chainsawriot opened this issue Mar 19, 2024 · 5 comments

Comments

@chainsawriot
Copy link
Collaborator

Ref #20 tidyverse/readr#1509

text_only <- structure(list(weight = c("4.17"),
                                group = c("ctrl")),
                           class = "data.frame", row.names = c(NA, -1L))

unnamed_cols <- readr::cols(readr::col_skip(), readr::col_guess())
named_cols <- readr::cols(weight = readr::col_skip(), group = readr::col_guess())
partially_named_cols <- readr::cols(weight = readr::col_skip())

readr::type_convert(text_only, unnamed_cols) ## won't skip?
#>   weight group
#> 1   4.17  ctrl
readr::type_convert(text_only, named_cols) ## skip
#>   group
#> 1  ctrl
readr::type_convert(text_only, partially_named_cols) ## skip
#>   group
#> 1  ctrl

minty::type_convert(text_only, unnamed_cols) ## bug-to-bug compatibility?
#>   weight group
#> 1   4.17  ctrl
minty::type_convert(text_only, named_cols) ## skip
#>   group
#> 1  ctrl
minty::type_convert(text_only, partially_named_cols) ## skip
#>   group
#> 1  ctrl

## but it is probably not the expected behavior for the so-called 2e from vroom

readr::read_csv(I("weight,group\n4.17,ctrl\n"), col_names = TRUE, col_types = unnamed_cols)
#> # A tibble: 1 × 1
#>   group
#>   <chr>
#> 1 ctrl
readr::read_csv(I("weight,group\n4.17,ctrl\n"), col_names = TRUE, col_types = named_cols)
#> # A tibble: 1 × 1
#>   group
#>   <chr>
#> 1 ctrl
readr::read_csv(I("weight,group\n4.17,ctrl\n"), col_names = TRUE, col_types = partially_named_cols)
#> # A tibble: 1 × 1
#>   group
#>   <chr>
#> 1 ctrl

Created on 2024-03-19 with reprex v2.1.0

@khusmann
Copy link

khusmann commented Mar 25, 2024

Hello! I might be interested in using this package as a backend for this project I'm working on: https://kylehusmann.com/interlacer/

For me, I would prefer feature parity with vroom instead of readr::type_convert(). This way I could ensure that the data parsed through my read_interlaced_*() wrappers get identical values to the stock read_*() family of functions.

Or maybe you could offer the option to switch between feature parity with readr::type_convert() vs vroom::vroom(), similar to readr::with_edition()?

@chainsawriot
Copy link
Collaborator Author

@khusmann Thank you very much for your interest. I must mention that I am still coordinating with the tidyverse team on releasing this to CRAN. So there is still quite a (long) way for this to be easily available for your package.

As far as 1e/2e, I believe the most important difference (and for the code relevant to type_convert()) is on how col_types (especially the partial version) is handled, like in this case. And for that, I would probably follow vroom as the default. There is another issue related to that (see #11). For this one, it might take longer time. But the ultimate goal is to have it like vroom, i.e. follow the IEEE 754 standard.

@khusmann
Copy link

And for that, I would probably follow vroom as the default. There is another issue related to that (see #11)

Ah, sorry I missed that! But yeah, that behavior would be perfect for my purposes.

So there is still quite a (long) way for this to be easily available for your package.

Yup, I totally understand. My main reason for reaching out was to affirm what you're doing here -- I think you're meeting a key need with this! If there's any way I can help advocate to the tidyverse team, please let me know...

@chainsawriot
Copy link
Collaborator Author

Another "?b2b compatibility" issue tidyverse/readr#1536

## This gives columns of "1" and 2, rather than 1 and 2
str(readr::type_convert(data.frame(a="1 ", b=" 2"), trim_ws = TRUE))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   a = col_character(),
#>   b = col_double()
#> )
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1"
#>  $ b: num 2
# This gives columns "1 " and 2, rather than "1 " and " 2"
str(readr::type_convert(data.frame(a="1 ", b=" 2"), trim_ws = FALSE))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   a = col_character(),
#>   b = col_double()
#> )
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1 "
#>  $ b: num 2

str(minty::type_convert(data.frame(a="1 ", b=" 2"), trim_ws = TRUE))
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1"
#>  $ b: num 2

str(minty::type_convert(data.frame(a="1 ", b=" 2"), trim_ws = FALSE))
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1 "
#>  $ b: num 2

## vroom

readr::read_csv(I("a,b\n1 , 2\n"), trim_ws = TRUE)
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (2): a, b
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2

readr::read_csv(I("a,b\n1 , 2\n"), trim_ws = FALSE)
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): a, b
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#>   a     b    
#>   <chr> <chr>
#> 1 "1 "  " 2"

Created on 2024-05-04 with reprex v2.1.0

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Jun 7, 2024

Collection of all of these bugs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants