Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv() reads wrong column when using col_select and name_repair = "minimal" in a file with duplicated column names #1456

Open
lucasnanni opened this issue Dec 8, 2022 · 1 comment
Labels
bug an unexpected problem or unintended behavior read 📖

Comments

@lucasnanni
Copy link

Given a csv file with duplicated column names, when I use read_csv() to read the file with the options name_repair = "minimal" and col_select set to include the second occurrence of the repeated column, the first occurrence is read instead.

In the reprex below I've created a csv table with only two columns, both named x. When I set name_repair = "minimal" and col_select = 2, the first column is read instead. Without the option name_repair = "minimal", the second column is read correctly.

tab <- I(
"x,x
a,1
b,2
c,3"
)

readr::read_csv(tab, col_select = 2, name_repair = "minimal")
#> Rows: 3 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): x
#> dbl (1): x
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 3 × 1
#>   x    
#>   <chr>
#> 1 a    
#> 2 b    
#> 3 c

readr::read_csv(tab, col_select = 2)
#> New names:
#> Rows: 3 Columns: 1
#> ── Column specification
#> ──────────────────────────────────────────────────────── Delimiter: "," dbl
#> (1): x...2
#> ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
#> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> • `x` -> `x...1`
#> • `x` -> `x...2`
#> # A tibble: 3 × 1
#>   x...2
#>   <dbl>
#> 1     1
#> 2     2
#> 3     3

Created on 2022-12-07 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 Patched (2022-11-10 r83330)
#>  os       Ubuntu 22.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    pt_BR.UTF-8
#>  tz       America/Sao_Paulo
#>  date     2022-12-07
#>  pandoc   2.9.2.1 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  bit           4.0.4   2020-08-04 [2] CRAN (R 4.2.1)
#>  bit64         4.0.5   2020-08-30 [2] CRAN (R 4.2.1)
#>  cli           3.4.1   2022-09-23 [2] CRAN (R 4.2.1)
#>  crayon        1.5.2   2022-09-29 [2] CRAN (R 4.2.1)
#>  digest        0.6.30  2022-10-18 [2] CRAN (R 4.2.1)
#>  ellipsis      0.3.2   2021-04-29 [2] CRAN (R 4.2.1)
#>  evaluate      0.18    2022-11-07 [2] CRAN (R 4.2.2)
#>  fansi         1.0.3   2022-03-24 [2] CRAN (R 4.2.1)
#>  fastmap       1.1.0   2021-01-25 [2] CRAN (R 4.2.1)
#>  fs            1.5.2   2021-12-08 [2] CRAN (R 4.2.1)
#>  glue          1.6.2   2022-02-24 [2] CRAN (R 4.2.1)
#>  highr         0.9     2021-04-16 [2] CRAN (R 4.2.1)
#>  hms           1.1.2   2022-08-19 [2] CRAN (R 4.2.1)
#>  htmltools     0.5.3   2022-07-18 [2] CRAN (R 4.2.1)
#>  knitr         1.40    2022-08-24 [2] CRAN (R 4.2.1)
#>  lifecycle     1.0.3   2022-10-07 [2] CRAN (R 4.2.1)
#>  magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.2.1)
#>  pillar        1.8.1   2022-08-19 [2] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.2.1)
#>  purrr         0.3.5   2022-10-06 [2] CRAN (R 4.2.1)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.1)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.1)
#>  R.utils       2.12.1  2022-10-30 [1] CRAN (R 4.2.1)
#>  R6            2.5.1   2021-08-19 [2] CRAN (R 4.2.1)
#>  readr         2.1.3   2022-10-01 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang         1.0.6   2022-09-24 [2] CRAN (R 4.2.1)
#>  rmarkdown     2.18    2022-11-09 [2] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2   2021-12-06 [2] CRAN (R 4.2.1)
#>  stringi       1.7.8   2022-07-11 [2] CRAN (R 4.2.1)
#>  stringr       1.4.1   2022-08-20 [2] CRAN (R 4.2.1)
#>  styler        1.8.0   2022-10-22 [1] CRAN (R 4.2.1)
#>  tibble        3.1.8   2022-07-22 [2] CRAN (R 4.2.1)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.1)
#>  tzdb          0.3.0   2022-03-28 [2] CRAN (R 4.2.1)
#>  utf8          1.2.2   2021-07-24 [2] CRAN (R 4.2.1)
#>  vctrs         0.5.1   2022-11-16 [2] CRAN (R 4.2.2)
#>  vroom         1.6.0   2022-09-30 [2] CRAN (R 4.2.1)
#>  withr         2.5.0   2022-03-03 [2] CRAN (R 4.2.1)
#>  xfun          0.34    2022-10-18 [2] CRAN (R 4.2.1)
#>  yaml          2.3.6   2022-10-18 [2] CRAN (R 4.2.1)
#> 
#>  [1] /home/nanni/R/x86_64-pc-linux-gnu-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@lucasnanni lucasnanni changed the title read_csv() reads wrong column when using col_select and name_repair = minimal in a file with duplicated column names read_csv() reads wrong column when using col_select and name_repair = "minimal" in a file with duplicated column names Dec 8, 2022
@hadley
Copy link
Member

hadley commented Jul 31, 2023

Somewhat more minimal reprex:

tab <- "
x,x
a,1
b,2
c,3"

readr::read_csv(tab, col_select = 2, name_repair = "minimal", col_types = list())
#> # A tibble: 3 × 1
#>   x    
#>   <chr>
#> 1 a    
#> 2 b    
#> 3 c

readr::read_csv(tab, col_select = 2, col_types = list())
#> New names:
#> • `x` -> `x...1`
#> • `x` -> `x...2`
#> # A tibble: 3 × 1
#>   x...2
#>   <dbl>
#> 1     1
#> 2     2
#> 3     3

Created on 2023-07-31 with reprex v2.0.2

@hadley hadley added bug an unexpected problem or unintended behavior read 📖 labels Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior read 📖
Projects
None yet
Development

No branches or pull requests

2 participants