Bug to bug compatibility with `readr::type_convert()`, but is it correct? #22

chainsawriot · 2024-03-19T17:41:34Z

text_only <- structure(list(weight = c("4.17"),
                                group = c("ctrl")),
                           class = "data.frame", row.names = c(NA, -1L))

unnamed_cols <- readr::cols(readr::col_skip(), readr::col_guess())
named_cols <- readr::cols(weight = readr::col_skip(), group = readr::col_guess())
partially_named_cols <- readr::cols(weight = readr::col_skip())

readr::type_convert(text_only, unnamed_cols) ## won't skip?
#>   weight group
#> 1   4.17  ctrl
readr::type_convert(text_only, named_cols) ## skip
#>   group
#> 1  ctrl
readr::type_convert(text_only, partially_named_cols) ## skip
#>   group
#> 1  ctrl

minty::type_convert(text_only, unnamed_cols) ## bug-to-bug compatibility?
#>   weight group
#> 1   4.17  ctrl
minty::type_convert(text_only, named_cols) ## skip
#>   group
#> 1  ctrl
minty::type_convert(text_only, partially_named_cols) ## skip
#>   group
#> 1  ctrl

## but it is probably not the expected behavior for the so-called 2e from vroom

readr::read_csv(I("weight,group\n4.17,ctrl\n"), col_names = TRUE, col_types = unnamed_cols)
#> # A tibble: 1 × 1
#>   group
#>   <chr>
#> 1 ctrl
readr::read_csv(I("weight,group\n4.17,ctrl\n"), col_names = TRUE, col_types = named_cols)
#> # A tibble: 1 × 1
#>   group
#>   <chr>
#> 1 ctrl
readr::read_csv(I("weight,group\n4.17,ctrl\n"), col_names = TRUE, col_types = partially_named_cols)
#> # A tibble: 1 × 1
#>   group
#>   <chr>
#> 1 ctrl

^{Created on 2024-03-19 with reprex v2.1.0}

khusmann · 2024-03-25T20:13:45Z

Hello! I might be interested in using this package as a backend for this project I'm working on: https://kylehusmann.com/interlacer/

For me, I would prefer feature parity with vroom instead of readr::type_convert(). This way I could ensure that the data parsed through my read_interlaced_*() wrappers get identical values to the stock read_*() family of functions.

Or maybe you could offer the option to switch between feature parity with readr::type_convert() vs vroom::vroom(), similar to readr::with_edition()?

chainsawriot · 2024-03-25T20:59:38Z

@khusmann Thank you very much for your interest. I must mention that I am still coordinating with the tidyverse team on releasing this to CRAN. So there is still quite a (long) way for this to be easily available for your package.

As far as 1e/2e, I believe the most important difference (and for the code relevant to type_convert()) is on how col_types (especially the partial version) is handled, like in this case. And for that, I would probably follow vroom as the default. There is another issue related to that (see #11). For this one, it might take longer time. But the ultimate goal is to have it like vroom, i.e. follow the IEEE 754 standard.

khusmann · 2024-03-25T21:37:07Z

And for that, I would probably follow vroom as the default. There is another issue related to that (see #11)

Ah, sorry I missed that! But yeah, that behavior would be perfect for my purposes.

So there is still quite a (long) way for this to be easily available for your package.

Yup, I totally understand. My main reason for reaching out was to affirm what you're doing here -- I think you're meeting a key need with this! If there's any way I can help advocate to the tidyverse team, please let me know...

chainsawriot · 2024-05-04T08:01:09Z

Another "?b2b compatibility" issue tidyverse/readr#1536

## This gives columns of "1" and 2, rather than 1 and 2
str(readr::type_convert(data.frame(a="1 ", b=" 2"), trim_ws = TRUE))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   a = col_character(),
#>   b = col_double()
#> )
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1"
#>  $ b: num 2
# This gives columns "1 " and 2, rather than "1 " and " 2"
str(readr::type_convert(data.frame(a="1 ", b=" 2"), trim_ws = FALSE))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   a = col_character(),
#>   b = col_double()
#> )
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1 "
#>  $ b: num 2

str(minty::type_convert(data.frame(a="1 ", b=" 2"), trim_ws = TRUE))
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1"
#>  $ b: num 2

str(minty::type_convert(data.frame(a="1 ", b=" 2"), trim_ws = FALSE))
#> 'data.frame':    1 obs. of  2 variables:
#>  $ a: chr "1 "
#>  $ b: num 2

## vroom

readr::read_csv(I("a,b\n1 , 2\n"), trim_ws = TRUE)
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (2): a, b
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2

readr::read_csv(I("a,b\n1 , 2\n"), trim_ws = FALSE)
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): a, b
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#>   a     b    
#>   <chr> <chr>
#> 1 "1 "  " 2"

^{Created on 2024-05-04 with reprex v2.1.0}

chainsawriot · 2024-06-07T15:24:15Z

Collection of all of these bugs

Trailing spaces are not trimmed before guessing type #32

chainsawriot pinned this issue May 4, 2024

chainsawriot mentioned this issue Jun 7, 2024

Trailing spaces are not trimmed before guessing type #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug to bug compatibility with `readr::type_convert()`, but is it correct? #22

Bug to bug compatibility with `readr::type_convert()`, but is it correct? #22

chainsawriot commented Mar 19, 2024

khusmann commented Mar 25, 2024 •

edited

Loading

chainsawriot commented Mar 25, 2024

khusmann commented Mar 25, 2024

chainsawriot commented May 4, 2024

chainsawriot commented Jun 7, 2024 •

edited

Loading

Bug to bug compatibility with readr::type_convert(), but is it correct? #22

Bug to bug compatibility with readr::type_convert(), but is it correct? #22

Comments

chainsawriot commented Mar 19, 2024

khusmann commented Mar 25, 2024 • edited Loading

chainsawriot commented Mar 25, 2024

khusmann commented Mar 25, 2024

chainsawriot commented May 4, 2024

chainsawriot commented Jun 7, 2024 • edited Loading

Bug to bug compatibility with `readr::type_convert()`, but is it correct? #22

Bug to bug compatibility with `readr::type_convert()`, but is it correct? #22

khusmann commented Mar 25, 2024 •

edited

Loading

chainsawriot commented Jun 7, 2024 •

edited

Loading