-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
documentation issue for read_delim(), retire read_csv() and read_csv2() #1452
Comments
related to #1411 - by library(readr)
lines <- "C1
2011D2"
read_delim(lines) # Error: Could not guess the delimiter. |
While I was 🤯 when I first read this, I am not convinced this is the right approach on further inspection (not that my view matters much). You’re right that library(readr)
read_delim(I("a,b\n1.1,2.2")) # comma
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 1.1 2.2
read_delim(I("a;b\n1.1;2.2")) # semicolon
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> dbl (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 1.1 2.2
read_delim(I("a\tb\n1.1\t2.2")) # tab
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> dbl (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 1.1 2.2 But it fails to parse decimal numbers correctly when the decimal mark is a comma. read_delim(I("a;b\n1,1;2,2")) # semicolon
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> num (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 11 22
read_delim(I("a\tb\n1,1\t2,2")) # tab
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> num (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 11 22 This can be fixed of course by specifying read_delim(I("a;b\n1,1;2,2"), locale = locale(decimal_mark = ","))
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> dbl (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 1.1 2.2
AFAICT the documentation is correct here, i.e. this is exactly how it works. default_locale()
#> <locale>
#> Numbers: 123,456.78
#> Formats: %AD / %AT
#> Timezone: UTC
#> Encoding: UTF-8
#> <date_names>
#> Days: Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed), Thursday
#> (Thu), Friday (Fri), Saturday (Sat)
#> Months: January (Jan), February (Feb), March (Mar), April (Apr), May (May),
#> June (Jun), July (Jul), August (Aug), September (Sep), October
#> (Oct), November (Nov), December (Dec)
#> AM/PM: AM/PM
read_delim(I("a;b\n1,800;2.345"))
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> dbl (1): b
#> num (1): a
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 1800 2.35 (Because of #1445 it is not possible to use read_delim(I("a;b\n1,800;2.345"), locale = locale(decimal_mark = ",", grouping_mark = "."))
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> dbl (1): a
#> num (1): b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 1.8 2345 Or use read_csv2(I("a;b\n1,800;2.345"))
#> ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> dbl (1): a
#> num (1): b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 1.8 2345
This is only the case for read_csv(I("a,b\n1_800,2_345"), locale = locale(decimal_mark = ",", grouping_mark = "_"))
#> Rows: 1 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> num (2): a, b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 1 × 2
#> a b
#> <dbl> <dbl>
#> 1 1800 2345
This is technically true. locale("de_CH")
#> Error: Unknown language 'de_CH' However, Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.2 (2022-10-31 ucrt)
#> os Windows 10 x64 (build 19044)
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.utf8
#> ctype German_Germany.utf8
#> tz Europe/Berlin
#> date 2023-02-07
#> pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> ! package * version date (UTC) lib source
#> D archive 1.1.5 2022-05-06 [1] CRAN (R 4.2.2)
#> bit 4.0.5 2022-11-15 [1] CRAN (R 4.2.2)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0)
#> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.2)
#> crayon 1.5.2 2022-09-29 [1] RSPM
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
#> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
#> fs 1.6.0 2023-01-23 [1] CRAN (R 4.2.2)
#> glue 1.6.2.9000 2023-01-16 [1] Github (tidyverse/glue@5a16502)
#> hms 1.1.2 2022-08-19 [1] CRAN (R 4.2.1)
#> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
#> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2)
#> lifecycle 1.0.3 2022-10-07 [1] RSPM
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.1)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.2)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> readr * 2.1.3 2022-10-01 [1] CRAN (R 4.2.1)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.1)
#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1)
#> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.2)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> styler 1.9.0 2023-01-15 [1] CRAN (R 4.2.2)
#> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)
#> tidyselect 1.2.0 2022-10-10 [1] RSPM
#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2)
#> vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.2.2)
#> vroom 1.6.1 2023-01-22 [1] CRAN (R 4.2.2)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.2)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.2)
#>
#> [1] C:/Users/Daniel/AppData/Local/R/win-library/4.2
#> [2] C:/Program Files/R/R-4.2.2/library
#>
#> D ── DLL MD5 mismatch, broken installation.
#>
#> ────────────────────────────────────────────────────────────────────────────── |
@dpprdan thank you for your analysis. I share your experience but not your conclusions. The point is that the docs emphasize almost entirely on read_csv() and read_csv2(). The docs also state that one needs to use the locale to change details like the decimal separator. This is true for read_delim() but not for read_csv(), read_csv2(), and read_tsv(). Therefore, the docs are simply incorrect, because the functions won't work that way, as you remarked yourself. Moreover, the docs somewhat suggest that read_delim() should be the weappon of last resort and therefore presents no examples of its proper use. I ran the functions against several open datasets from the Swiss as well as from the Austrian and German government plus our own CSV files exported from EXCEL under the Swiss and Austro/German locale. No problem with read_csv() and read_csv2() for the Austrian and German data, where comma is the decimal point and the semicolon is the column separator. However, the same files also parsed absolutely fine with read_delim() without any parameters other than the file name. The situation is different under 'de_ch': None of the swiss datasets are properly parsed with read_csv2()+locale(), because the Swiss use the semicolon as a column separator and the dot as the decimal point. Processing the same files with read_delim(), there are no problems with these files when the file name is passed as the sole parameter. As the docs fail to present the smartness of read_delim(), my Swiss students fail to resolve this problem on their own: The docs create the false impression that read_delim() is complex and requires many parameters, while read_csv() and read_csv2() are the nice fellows that should be used pretty much all the time. I started teaching that one would start with read_delim(), because in 99.9% of the cases it just delivers without the need to think about delimiters and decimal points. (Which I find too awesome to hide, BTW). This is pretty much the opposite of what the docs state. This is bad, because I also emphasize that they should read the docs in order to find out about the correct usage of functions. 😖 Only for the rough edge cases that you describe, the more specialized functions are actually useful, I find. I might be biased in my selection of datasets, but I fear that the cases only read_csv2() can resolve easily but read_delim() can't, are not as common as you and the docs suggest. For beginners all of this is impossible to figure out all by themselves simply by looking at the docs. Even for me, it took me several days of repeated attempts and then finally turning to the code to come to my insights. All because the docs are pointing in the wrong direction. Again, all of this could be easily resolved by adding a few statements and examples to the docs without changing any logic. |
Here is a random file from Statistik Austria where library(readr)
at_url <- "https://data.statistik.gv.at/data/OGD_veste309_Veste309_1.csv" delimiter: semicolon, decimal mark: comma read_lines(at_url) |> head(2)
#> [1] "C-A11-0;C-STAATS-0;C-VEBDL-0;C-BESCHV-0;F-VESTE_AM;F-VESTE_Q25;F-VESTE_Q50;F-VESTE_Q75;F-VESTE_UB"
#> [2] "A11-1;STAATS-9;VEBDL-10;BESCHV-1;17,60;11,65;15,09;20,12;2650938,00" comma is parsed as grouping mark and therefore numbers are parsed as x * 10^no of decimal places read_delim(at_url) |> head(2)
#> Rows: 72 Columns: 9
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> chr (4): C-A11-0, C-STAATS-0, C-VEBDL-0, C-BESCHV-0
#> num (5): F-VESTE_AM, F-VESTE_Q25, F-VESTE_Q50, F-VESTE_Q75, F-VESTE_UB
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 × 9
#> `C-A11-0` `C-STAATS-0` C-VEB…¹ C-BES…² F-VES…³ F-VES…⁴ F-VES…⁵ F-VES…⁶ F-VES…⁷
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A11-1 STAATS-9 VEBDL-… BESCHV… 1760 1165 1509 2012 2.65e8
#> 2 A11-1 STAATS-9 VEBDL-… BESCHV… 1895 1278 1622 2174 1.69e8
#> # … with abbreviated variable names ¹`C-VEBDL-0`, ²`C-BESCHV-0`, ³`F-VESTE_AM`,
#> # ⁴`F-VESTE_Q25`, ⁵`F-VESTE_Q50`, ⁶`F-VESTE_Q75`, ⁷`F-VESTE_UB` compare with read_csv2(at_url) |> head(2)
#> ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
#> Rows: 72 Columns: 9
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> chr (4): C-A11-0, C-STAATS-0, C-VEBDL-0, C-BESCHV-0
#> dbl (5): F-VESTE_AM, F-VESTE_Q25, F-VESTE_Q50, F-VESTE_Q75, F-VESTE_UB
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 × 9
#> `C-A11-0` `C-STAATS-0` C-VEB…¹ C-BES…² F-VES…³ F-VES…⁴ F-VES…⁵ F-VES…⁶ F-VES…⁷
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A11-1 STAATS-9 VEBDL-… BESCHV… 17.6 11.6 15.1 20.1 2650938
#> 2 A11-1 STAATS-9 VEBDL-… BESCHV… 19.0 12.8 16.2 21.7 1685788
#> # … with abbreviated variable names ¹`C-VEBDL-0`, ²`C-BESCHV-0`, ³`F-VESTE_AM`,
#> # ⁴`F-VESTE_Q25`, ⁵`F-VESTE_Q50`, ⁶`F-VESTE_Q75`, ⁷`F-VESTE_UB`
at_url2 <- "https://data.statistik.gv.at/data/OGD_vpi20_VPI_2020_1.csv"
read_delim(at_url2) |> head(2)
#> Rows: 351 Columns: 10
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> chr (8): C-VPIZR-0, C-VPI5NEU-0, F-VPIMZVM, F-VPIPZVM, F-VPIPZVJM, F-VPIEFVM...
#> num (2): F-VPIMZBM, F-VPIMZVJM
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 × 10
#> `C-VPIZR-0` C-VPI5N…¹ F-VPI…² F-VPI…³ F-VPI…⁴ F-VPI…⁵ F-VPI…⁶ F-VPI…⁷ F-VPI…⁸
#> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 VPIZR-202101 VPI-0 1.00e7 101,10… 9950000 -0,800… 0,80000 -0,782… 0,80000
#> 2 VPIZR-202101 VPI-01 9.76e6 100,80… 9870000 -3,200… -1,100… -0,363… -0,115…
#> # … with 1 more variable: `F-VPIGEWBM` <chr>, and abbreviated variable names
#> # ¹`C-VPI5NEU-0`, ²`F-VPIMZBM`, ³`F-VPIMZVM`, ⁴`F-VPIMZVJM`, ⁵`F-VPIPZVM`,
#> # ⁶`F-VPIPZVJM`, ⁷`F-VPIEFVM`, ⁸`F-VPIEFVJM`
read_csv2(at_url2) |> head(2)
#> ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
#> Rows: 351 Columns: 10── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> chr (2): C-VPIZR-0, C-VPI5NEU-0
#> dbl (8): F-VPIMZBM, F-VPIMZVM, F-VPIMZVJM, F-VPIPZVM, F-VPIPZVJM, F-VPIEFVM,...
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 × 10
#> `C-VPIZR-0` C-VPI5N…¹ F-VPI…² F-VPI…³ F-VPI…⁴ F-VPI…⁵ F-VPI…⁶ F-VPI…⁷ F-VPI…⁸
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 VPIZR-202101 VPI-0 100. 101. 99.5 -0.8 0.8 -0.782 0.8
#> 2 VPIZR-202101 VPI-01 97.6 101. 98.7 -3.2 -1.1 -0.363 -0.115
#> # … with 1 more variable: `F-VPIGEWBM` <dbl>, and abbreviated variable names
#> # ¹`C-VPI5NEU-0`, ²`F-VPIMZBM`, ³`F-VPIMZVM`, ⁴`F-VPIMZVJM`, ⁵`F-VPIPZVM`,
#> # ⁶`F-VPIPZVJM`, ⁷`F-VPIEFVM`, ⁸`F-VPIEFVJM` A tf <- tempfile()
write_csv2(head(mtcars, 3), tf)
read_lines(tf)
#> [1] "mpg;cyl;disp;hp;drat;wt;qsec;vs;am;gear;carb"
#> [2] "21,0;6;160;110;3,90;2,620;16,46;0;1;4;4"
#> [3] "21,0;6;160;110;3,90;2,875;17,02;0;1;4;4"
#> [4] "22,8;4;108;93;3,85;2,320;18,61;1;1;4;1" Again, numbers with comma are parsed * 10^no of decimal places read_delim(tf)
#> Rows: 3 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> dbl (7): cyl, disp, hp, vs, am, gear, carb
#> num (4): mpg, drat, wt, qsec
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 3 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 210 6 160 110 390 2620 1646 0 1 4 4
#> 2 210 6 160 110 390 2875 1702 0 1 4 4
#> 3 228 4 108 93 385 2320 1861 1 1 4 1
read_csv2(tf)
#> ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
#> Rows: 3 Columns: 11── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ";"
#> dbl (11): mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 3 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 I’d hardly call these edge cases. I don’t know what files you tested (because you did not provide any reproducible examples), but I’ve yet to see a csv file with semicolon as delimiter and comma as decimal that Seeing how It certainly sucks that the “Swiss file spec” (i.e. semicolon as delimiter, dot as decimal) isn’t handled well by readr at the moment (at least if you follow the docs). I suspect that much of the confusion stems from the hard-coded comma as decimal separator in |
Thank you, @dpprdan for elaborating the case. But your examples are not really a problem, because all you describe could be easily resolved by using As I said, I am not against Personally, I feel more inclined towards fixing the documentation than the logic. However, right now neither is very user friendly. |
There's a lot of discussion. Could someone please briefly summarise the problem for me? |
IIUC, @phish108 proposes to focus the documentation on The motivation is
My counter arguments are basically
|
In that case, I think the best fix is to introduce |
Hi @hadley There are two things:
From the docs and the vignettes users won't get a clue in what way these functions differ from vanilla R. My point regarding read_delim() is that the docs should mention a minimal and a more extensive example of using read_delim() and a short note that the function uses some heuristic for guessing the format. Or if this makes no sense from your viewpoint it should include at least a statement when to use and when not to use read_delim(). The locale vignette should also mention read_delim() as it is the only function makes full use of the locale, whereas read_csv*() do not. The current examples in the vignette do not work most of the time in my work and for my teaching. All the logic might be obvious to you, it is not to me. Therefore, I ask for the function being documented a bit more verbose than just offering the mere parameters. This alone would resolve all our data import problem. Therefore, I reopen the issue. |
I've created #1503 to track some of the documentation issues. I don't understand what you mean by |
Thank you for the reference to the documentation issue. As noted above, changing the decimal separator appears to have no effect at least in read_csv2() until very recently. This means that the spec given via locale() is not fully used. For read_csv() this is not applicable because the comma is already taken as a column separator. Again, the spec given via locale() is not fully used. With this respect, it is not entirely clear whether the parameters or the locale wins in case of conflict. This is what I mean when I write that not all locale() features are available to these functions. From the docs I was unable to deduce the behavior of read_csv2(). |
This issue is directly related to #1445, but separate from it with respect of being a documentation issue.
As noted the
read_csv*
family does not allow to override locales properly. In my case I cannot set the Swiss locale ("ch", "de-ch", "fr-ch", or "it-ch"). None of those locale is recognised bylocale()
. Naturally, Swiss users would useread_csv2()
as many CSV-files are semicolon separated, particularly CSV-files that where exported from EXCEL under these locale. In these cases, users would need to be able to flip thedecimal_mark
and thegrouping_mark
. The documentation wrongly suggests thelocale
-parameter for such cases.However,
read_delim()
appears smart enough to handle pretty much all cases correctly without extra parameters.With this respect the documentation is misleading by being overly concise. It suggests that the functions
read_csv()
andread_csv2()
are not just mere convenience functions forread.csv()
andread.csv2()
and stresses the use of thelocale
parameter. However, in all of my cases all function pairs of the code below lead to the same results, apart fromread_csv2()
. Therefore, I have the impression thatread_delim()
offers the best way for importing CSV-files, correctly.This smart behaviour of
read_delim()
should be stated more clearly in the reference documentation, because it makes life much easier by needing only one function over three. This would enhance the learnability of this part of the tidyverse.As the documentation stands right now, the examples emphasize on the other functions, instead of highlighting the smartness of
read_delim()
. However, it appears to me, thatread_delim()
should be used for reading pretty much any delimited text files and not just to gain more control. The other functions make sense for (rare?) border cases where the data is contradictory regarding the delimiter. But even then using the parameters ofread_delim()
appears more useful. Therefore, I suggest to consider a lifecycle update to the other three functions.The text was updated successfully, but these errors were encountered: