You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was recently reading a table into R using the readr package but discovered that several rows were missing as some cells included a single double quotation mark.
I managed to resolve this by setting quote = "", however I wonder if the read_tsv()/related functions could be updated to at least raise a warning if improper quoting is discovered in the data? The data.table package both raises a warning and deals with this automatically, as per the following reprex.
Many thanks for considering!
library(tibble)
library(readr)
library(data.table)
# create example filedf<- tribble(
~code, ~description,
"1", '"Single quotes',
"2", "No quotes",
"3", 'Single quotes"',
"4", '"Pair of quotes"',
"5", "No quotes"
)
file_path<- tempfile(pattern="df_with_quotes", fileext=".tsv")
write_tsv(df,
file_path,
quote="none",
escape="none"
)
# `readr::read_tsv()` reads inappropriately without raising an error with# default settings
read_tsv(file_path,
quote="\""
)
#> Rows: 3 Columns: 2#> ── Column specification ────────────────────────────────────────────────────────#> Delimiter: "\t"#> chr (1): description#> dbl (1): code#> #> ℹ Use `spec()` to retrieve the full column specification for this data.#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.#> # A tibble: 3 × 2#> code description #> <dbl> <chr> #> 1 1 "Single quotes\n2\tNo quotes\n3\tSingle quotes"#> 2 4 "Pair of quotes" #> 3 5 "No quotes"# To read correctly use the `quote` argument
read_tsv(file_path,
quote=""
)
#> Rows: 5 Columns: 2#> ── Column specification ────────────────────────────────────────────────────────#> Delimiter: "\t"#> chr (1): description#> dbl (1): code#> #> ℹ Use `spec()` to retrieve the full column specification for this data.#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.#> # A tibble: 5 × 2#> code description #> <dbl> <chr> #> 1 1 "\"Single quotes" #> 2 2 "No quotes" #> 3 3 "Single quotes\"" #> 4 4 "\"Pair of quotes\""#> 5 5 "No quotes"# `data.table::fread()` raises a warning but reads the table correctly with# default settings
fread(file_path,
quote="\""
)
#> Warning in fread(file_path, quote = "\""): Found and resolved improper quoting#> in first 100 rows. If the fields are not quoted (e.g. field separator does not#> appear within any field), try quote="" to avoid this warning.#> code description#> 1: 1 "Single quotes#> 2: 2 No quotes#> 3: 3 Single quotes"#> 4: 4 Pair of quotes#> 5: 5 No quotes# setting `quote=""` produces (almost) the same result, now without a warning
fread(file_path,
quote=""
)
#> code description#> 1: 1 "Single quotes#> 2: 2 No quotes#> 3: 3 Single quotes"#> 4: 4 "Pair of quotes"#> 5: 5 No quotes
sessionInfo()
#> R version 4.2.0 (2022-04-22)#> Platform: x86_64-apple-darwin17.0 (64-bit)#> Running under: macOS Big Sur/Monterey 10.16#> #> Matrix products: default#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib#> #> locale:#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8#> #> attached base packages:#> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages:#> [1] data.table_1.14.6 readr_2.1.4 tibble_3.1.8 #> #> loaded via a namespace (and not attached):#> [1] pillar_1.8.1 compiler_4.2.0 R.methodsS3_1.8.2 R.utils_2.12.2 #> [5] tools_4.2.0 digest_0.6.31 bit_4.0.5 evaluate_0.20 #> [9] lifecycle_1.0.3 R.cache_0.16.0 pkgconfig_2.0.3 rlang_1.0.6 #> [13] reprex_2.0.2 cli_3.6.0 rstudioapi_0.14 parallel_4.2.0 #> [17] yaml_2.3.7 xfun_0.37 fastmap_1.1.0 withr_2.5.0 #> [21] styler_1.9.1 knitr_1.42 fs_1.6.1 vctrs_0.5.2 #> [25] hms_1.1.2 tidyselect_1.2.0 bit64_4.0.5 glue_1.6.2 #> [29] R6_2.5.1 fansi_1.0.4 vroom_1.6.1 rmarkdown_2.20 #> [33] purrr_1.0.1 tzdb_0.3.0 magrittr_2.0.3 htmltools_0.5.4 #> [37] ellipsis_0.3.2 utf8_1.2.3 crayon_1.5.2 R.oo_1.25.0
The text was updated successfully, but these errors were encountered:
I experienced the same issue with read_csv(). A single double quote preceding one of the values omitted the remaining records. Only the warning from fread() saved me:
Hi, I was recently reading a table into R using the readr package but discovered that several rows were missing as some cells included a single double quotation mark.
I managed to resolve this by setting
quote = ""
, however I wonder if theread_tsv()
/related functions could be updated to at least raise a warning if improper quoting is discovered in the data? The data.table package both raises a warning and deals with this automatically, as per the following reprex.Many thanks for considering!
The text was updated successfully, but these errors were encountered: