You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
readr 2.0 introduced ‘lazy’ reading by default. The idea of lazy reading is that instead of reading all the data in a CSV file up front you instead read it only on-demand.
Following the docs, I made certain that I am using "edition" 2 and that lazy reading is turned on:
I created a large file by stacking 1000 iterations of shuffled mtcars using this code:
for (iin1:1000)
{
q=mtcars %>% sample_frac(1)
write_csv(q, "mtcars.csv", append=T)
}
It's approximately 1.2M on disk
(base) balter@expiyes:~/winhome/OneDrive/Documents$ ls -lsh mtcars.csv
1.2M -rwxrwxrwx 1 balter balter 1.2M Jun 15 22:57 mtcars.csv
(base) balter@expiyes:~/winhome/OneDrive/Documents$ ls -ls mtcars.csv
1212 -rwxrwxrwx 1 balter balter 1237281 Jun 15 22:57 mtcars.csv
I then cleared my R environment, ran gc, and restarted R. Repeated that a few times for good measure. This is the output of my last call to gc:
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells50576427.111229976064408534.4Vcells7677165.9838860864165008212.6
And this is the memory usage report from RStudio:
Next, I read in the mtcars x 1000 file with df = readr::read_csv("mtcars.csv", lazy=T). R tells me that my 1.2M file that I read in "lazily" is taking up 2.8M in memory:
> object.size(df)
2827576bytes
This is my new memory usage report:
What I'm seeing is that I read in a file that is 1.2M on disk "lazily", and
The object has a size of 2.8M in memory according to R
The object added 19M to memory in RStudio used for R objects
The text was updated successfully, but these errors were encountered:
Lazy reading mostly impacts string columns, which mtcars lacks so it's not a good example. I'd also suggest using lobstr::obj_size() to precisely measure the size of R objects.
The
readr
docs link to blog post that says:Following the docs, I made certain that I am using "edition" 2 and that lazy reading is turned on:
I created a large file by stacking 1000 iterations of shuffled mtcars using this code:
It's approximately 1.2M on disk
I then cleared my R environment, ran
gc
, and restarted R. Repeated that a few times for good measure. This is the output of my last call togc
:And this is the memory usage report from RStudio:
Next, I read in the mtcars x 1000 file with
df = readr::read_csv("mtcars.csv", lazy=T)
. R tells me that my 1.2M file that I read in "lazily" is taking up 2.8M in memory:This is my new memory usage report:
What I'm seeing is that I read in a file that is 1.2M on disk "lazily", and
The text was updated successfully, but these errors were encountered: