Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_delim_chunked takes much more memory than expected? #1410

Open
timothy-barry opened this issue Jul 10, 2022 · 6 comments
Open

read_delim_chunked takes much more memory than expected? #1410

timothy-barry opened this issue Jul 10, 2022 · 6 comments
Labels
reprex needs a minimal reproducible example

Comments

@timothy-barry
Copy link
Contributor

I am using the read_delim_chunked function to process large text files chunk-by-chunk. My expectation is that memory is cleared after each chunk is read. However, this does not seem to be the case. The amount of memory required to read the text file (by chunking) is the same as the amount of memory to read the text file (without chunking). I assume that this is a bug, but maybe my understanding of read_delim_chunked is incorrect. The purpose of reading by chunk is to conserve memory, right? Thanks!

@timothy-barry
Copy link
Contributor Author

After a bit of searching through the issues on this repo, I noticed that at least one other person seems to be encountering this issue as well: #1120 (comment)_.

@timothy-barry
Copy link
Contributor Author

timothy-barry commented Jul 11, 2022

Additional note: this seems to be a more pervasive issue than I had realized. I tried loading a sequence of files via readr::read_delim. R ran out of memory despite the fact that (i) each file itself fits into memory and (ii) I loaded the files 1-by-1.

# readr: runs out-of-memory
for (f in fs) {
  print(paste0("Loading ", f))
  x <- readr::read_delim(file = f,
                         delim = " ",
                         skip = 2,
                         col_types = c("iii"))
  rm(x); gc()
}

I repeated this experiment with data.table's fread function; everything works as expected.

# data.table: everything works
for (f in fs) {
  print(paste0("Loading ", f))
  x <- data.table::fread(file = f,
                         sep = " ",
                         colClasses = c("integer", "integer", "integer"),
                         skip = 2)
  rm(x); gc()
}

As far as I can tell, the current version of readr seems to suffer from more global memory leak issues, unfortunately.

@ben18785
Copy link

I am having the same issue. The memory use increases almost monotonically even though the individual chunks are small.

@timothy-barry
Copy link
Contributor Author

Any updates or workarounds? Can I use edition 1 (via with_edition(1, ...) or local_edition(1)) to resolve this issue, at least for the time being?

@arthurgailes
Copy link

Having the same problem here.

@hadley
Copy link
Member

hadley commented Jul 31, 2023

To investigate this issue we'll need a reprex, and some indication of how you're measuring R's memory consumption.

@hadley hadley added reprex needs a minimal reproducible example and removed performance 🚀 labels Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example
Projects
None yet
Development

No branches or pull requests

5 participants