Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_tsv() doesn't un-escape doubled quotes #1402

Open
lazappi opened this issue May 6, 2022 · 1 comment
Open

read_tsv() doesn't un-escape doubled quotes #1402

lazappi opened this issue May 6, 2022 · 1 comment
Labels
bug an unexpected problem or unintended behavior read 📖

Comments

@lazappi
Copy link

lazappi commented May 6, 2022

When you use read_tsv() to read a file which contains quotes escaped by doubling (which is the default in write_tsv()) instead of the quotes being un-escaped you get strings with two quotes. If a file goes through multiple read/write cycles you end up with increasingly large sequences of quotes.

df1 <- tibble::tribble(
    ~ A, ~ B,
    "String", 1,
    "String with 'single quotes'", 2,
    'String with "double quotes"', 3
)

readr::write_tsv(df1, "file1.tsv")
df2 <- readr::read_tsv("file1.tsv", col_types = "cd")

waldo::compare(df1, df2)
#> old vs new
#>                                        A
#>   old[1, ] String                       
#>   old[2, ] String with 'single quotes'  
#> - old[3, ] String with "double quotes"  
#> + new[3, ] String with ""double quotes""
#> 
#>     old$A                           | new$A                                  
#> [1] "String"                        | "String"                            [1]
#> [2] "String with 'single quotes'"   | "String with 'single quotes'"       [2]
#> [3] "String with \"double quotes\"" - "String with \"\"double quotes\"\"" [3]

readr::write_tsv(df2, "file2.tsv")
df3 <- readr::read_tsv("file2.tsv", col_types = "cd")

waldo::compare(df1, df3)
#> old vs new
#>                                            A
#>   old[1, ] String                           
#>   old[2, ] String with 'single quotes'      
#> - old[3, ] String with "double quotes"      
#> + new[3, ] String with """"double quotes""""
#> 
#> old$A vs new$A
#>   "String"
#>   "String with 'single quotes'"
#> - "String with \"double quotes\""
#> + "String with \"\"\"\"double quotes\"\"\"\""

Created on 2022-05-06 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22)
#>  os       macOS Catalina 10.15.7
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Berlin
#>  date     2022-05-06
#>  pandoc   2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  bit           4.0.4   2020-08-04 [1] CRAN (R 4.2.0)
#>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.2.0)
#>  cli           3.2.0   2022-02-14 [1] CRAN (R 4.2.0)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
#>  diffobj       0.3.5   2021-10-05 [1] CRAN (R 4.2.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  hms           1.1.1   2021-09-26 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
#>  knitr         1.38    2022-03-25 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.2.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.2.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.2.0)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.2.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  readr         2.1.2   2022-01-30 [1] CRAN (R 4.2.0)
#>  rematch2      2.1.2   2020-05-01 [1] CRAN (R 4.2.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.13    2022-03-10 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
#>  styler        1.7.0   2022-03-13 [1] CRAN (R 4.2.0)
#>  tibble        3.1.6   2021-11-07 [1] CRAN (R 4.2.0)
#>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)
#>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.0   2022-03-30 [1] CRAN (R 4.2.0)
#>  vroom         1.5.7   2021-11-30 [1] CRAN (R 4.2.0)
#>  waldo         0.4.0   2022-03-16 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.30    2022-03-02 [1] CRAN (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/luke.zappia/Library/R/x86_64/4.2/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

You can avoid this by setting escape in write_tsv() but would it make sense to have a similar argument for read_tsv() that does this in reverse?

@sbearrows sbearrows added feature a feature request or enhancement read 📖 labels Aug 25, 2022
@hadley
Copy link
Member

hadley commented Jul 31, 2023

Something is weird is going on here:

library(readr)

path <- tempfile()
write_tsv(data.frame(x = c('-"a"-', '"b"')), path)
writeLines(read_lines(path))
#> x
#> -""a""-
#> ""b""

cat(read_tsv(path, col_types = list())$x)
#> -""a""- b"
cat(read_delim(path, col_types = list(), quote = "", delim = ",", escape_double = TRUE)$x)
#> -""a""- ""b""

Created on 2023-07-31 with reprex v2.0.2

I can't seem to get this to read in even when I set all the options that I think should make it work. And look at what happens to b — it gets a quote on one side.

@hadley hadley added bug an unexpected problem or unintended behavior and removed feature a feature request or enhancement labels Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior read 📖
Projects
None yet
Development

No branches or pull requests

3 participants