`read_delim_chunked` takes much more memory than expected? #1410

timothy-barry · 2022-07-10T21:11:45Z

I am using the read_delim_chunked function to process large text files chunk-by-chunk. My expectation is that memory is cleared after each chunk is read. However, this does not seem to be the case. The amount of memory required to read the text file (by chunking) is the same as the amount of memory to read the text file (without chunking). I assume that this is a bug, but maybe my understanding of read_delim_chunked is incorrect. The purpose of reading by chunk is to conserve memory, right? Thanks!

The text was updated successfully, but these errors were encountered:

timothy-barry · 2022-07-10T22:11:42Z

After a bit of searching through the issues on this repo, I noticed that at least one other person seems to be encountering this issue as well: #1120 (comment)_.

timothy-barry · 2022-07-11T15:44:51Z

Additional note: this seems to be a more pervasive issue than I had realized. I tried loading a sequence of files via readr::read_delim. R ran out of memory despite the fact that (i) each file itself fits into memory and (ii) I loaded the files 1-by-1.

# readr: runs out-of-memory
for (f in fs) {
  print(paste0("Loading ", f))
  x <- readr::read_delim(file = f,
                         delim = " ",
                         skip = 2,
                         col_types = c("iii"))
  rm(x); gc()
}

I repeated this experiment with data.table's fread function; everything works as expected.

# data.table: everything works
for (f in fs) {
  print(paste0("Loading ", f))
  x <- data.table::fread(file = f,
                         sep = " ",
                         colClasses = c("integer", "integer", "integer"),
                         skip = 2)
  rm(x); gc()
}

As far as I can tell, the current version of readr seems to suffer from more global memory leak issues, unfortunately.

ben18785 · 2022-10-30T23:25:24Z

I am having the same issue. The memory use increases almost monotonically even though the individual chunks are small.

timothy-barry · 2022-12-21T03:05:54Z

Any updates or workarounds? Can I use edition 1 (via with_edition(1, ...) or local_edition(1)) to resolve this issue, at least for the time being?

arthurgailes · 2023-01-25T22:16:21Z

Having the same problem here.

hadley · 2023-07-31T22:38:45Z

To investigate this issue we'll need a reprex, and some indication of how you're measuring R's memory consumption.

sbearrows added the performance 🚀 label Aug 25, 2022

hadley added reprex needs a minimal reproducible example and removed performance 🚀 labels Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`read_delim_chunked` takes much more memory than expected? #1410

`read_delim_chunked` takes much more memory than expected? #1410

timothy-barry commented Jul 10, 2022

timothy-barry commented Jul 10, 2022

timothy-barry commented Jul 11, 2022 •

edited

Loading

ben18785 commented Oct 30, 2022

timothy-barry commented Dec 21, 2022

arthurgailes commented Jan 25, 2023

hadley commented Jul 31, 2023

read_delim_chunked takes much more memory than expected? #1410

read_delim_chunked takes much more memory than expected? #1410

Comments

timothy-barry commented Jul 10, 2022

timothy-barry commented Jul 10, 2022

timothy-barry commented Jul 11, 2022 • edited Loading

ben18785 commented Oct 30, 2022

timothy-barry commented Dec 21, 2022

arthurgailes commented Jan 25, 2023

hadley commented Jul 31, 2023

`read_delim_chunked` takes much more memory than expected? #1410

`read_delim_chunked` takes much more memory than expected? #1410

timothy-barry commented Jul 11, 2022 •

edited

Loading