Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential bug crashes RStudio when reading CSV file #1480

Open
pachadotdev opened this issue Mar 7, 2023 · 1 comment
Open

potential bug crashes RStudio when reading CSV file #1480

pachadotdev opened this issue Mar 7, 2023 · 1 comment

Comments

@pachadotdev
Copy link

pachadotdev commented Mar 7, 2023

As I explained on https://stackoverflow.com/questions/75657380/readr-vs-data-table-different-results-on-fedora?noredirect=1#comment133483471_75657380

I replaced Ubuntu 20.04 with Fedora 37 on my laptop (clean install, 16 GB RAM) to follow my lab's standard and, curiously, readr doesn't work with a 6.7 GB csv file in this specific case (it crashes RStudio). What can explain this? readr worked with Ubuntu.

library(archive)

url <- "https://www.usitc.gov/data/gravity/itpd_e/itpd_e_r02.zip"
zip <- gsub(".*/", "", url)

if (!file.exists(zip)) {
  try(download.file(url, zip, method = "wget", quiet = T))
}

if (!length(list.files(getwd(), pattern = "ITPD_E_R02\\.csv")) == 1) {
  archive_extract(zip, dir = getwd())
}

# this will crash RStudio
# trade <- readr::read_csv("/ITPD_E_R02.csv")

# this won't
trade <- data.table::fread("/ITPD_E_R02.csv")

free memory

$ free -m
               total        used        free      shared  buff/cache   available
Mem:           15699        4332        1106        1032       10259        9957
Swap:           8191           9        8182

session info

 sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 37 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.3

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] data.table_1.14.8 readr_2.1.4       archive_1.1.5    

loaded via a namespace (and not attached):
 [1] fansi_1.0.4       tzdb_0.3.0        utf8_1.2.3        R6_2.5.1          lifecycle_1.0.3  
 [6] magrittr_2.0.3    pillar_1.8.1      rlang_1.0.6       cli_3.6.0         rstudioapi_0.14  
[11] ellipsis_0.3.2    vctrs_0.5.2       tools_4.2.2       glue_1.6.2        hms_1.1.2        
[16] compiler_4.2.2    pkgconfig_2.0.3   CoprManager_0.5.0 tibble_3.1.8
@hadley
Copy link
Member

hadley commented Jul 31, 2023

Somewhat more minimal reprex:

library(archive)

url <- "https://www.usitc.gov/data/gravity/itpd_e/itpd_e_r02.zip"
path <- tempfile()

ua <- "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"
curl::multi_download(url, path, useragent = ua)

trade <- readr::read_csv(path, lazy = TRUE)

But this doesn't crash for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants