Memory leak when a generator is created inside a function #36

dfalbel · 2021-09-10T19:40:29Z

It looks like that the function environment where the generator is created and used is never released, even if the generator itself is not in scope anymore.

Here's a reprex:

g <- function() {
  hello <- sample(1:1e7) # large object to easily verify the leak
  generate_abc <- coro::generator(function() {
    for (x in letters[1:3]) {
      coro::yield(x)
    }
  })
  coro::loop(for(x in generate_abc()) {
    y <- x
  })
}

for (x in 1:10) {
  g()
  gc()
  print(lobstr::mem_used())
}
#> 84,001,952 B
#> 124,300,760 B
#> 164,343,576 B
#> 204,386,400 B
#> 244,429,208 B
#> 284,472,048 B
#> 324,514,856 B
#> 364,557,664 B
#> 404,600,472 B
#> 444,643,344 B

randy3k · 2021-10-14T22:24:18Z

I dug a little bit into the issue, it seems to be related to the implementation of the "get next" code.

gtor <- coro::generator(function() {
  long_vec <- sample(1:1e7) # large object to easily verify the leak
  for (x in long_vec) {
    coro::yield(x)
  }
})

g <- gtor()
# memory leaks when g() is called more than once
g()
#> [1] 1957623
g()
#> [1] 737927
for (i in 1:10) gc(); lobstr::mem_used()
#> 86,089,560 B
g = 0
for (i in 1:10) gc(); lobstr::mem_used()
#> 86,090,328 B

g <- gtor()
# doesn't leak when g() is called only once
g()
#> [1] 117777
for (i in 1:10) gc(); lobstr::mem_used()
#> 85,254,592 B
g = 0
for (i in 1:10) gc(); lobstr::mem_used()
#> 45,469,496 B

lionel- · 2021-10-15T12:12:45Z

Thanks for investigating. I'll look into a quick coro release after rlang 1.0 is out.

lionel- · 2021-12-03T12:20:10Z

The leak occurred through an environment inlined in the body of generator instances. When the JIT compiles a function, it caches the bytecode in a hash table. The inlined environment was included in the constant pool of the bytecode and leaked through that cache. To fix this, we now inline a weak reference to the environment instead.

memtools failed to detect the leak because bytecode objects are not currently traversed. I've opened r-lib/memtools#1 to track this.

dfalbel · 2021-12-03T14:20:28Z

Thanks @lionel- ! That was tricky, I think I would never be able to figure this out :)

lionel- · 2021-12-03T15:35:44Z

No worries @dfalbel
coro 1.0.2 is on CRAN. Sorry for the delay in treating this issue!

randy3k · 2021-12-04T10:40:09Z

Thanks. Just learn something today.

f <- function() {invisible(NULL)}
# put a large vector inline
body(f)[[2]][[2]] <- sample(1e7)

# run f two times to tigger JIT compilation
f()
f()

for (i in 1:10) gc(); lobstr::mem_used()
#> 89,809,128 B
rm(f)
for (i in 1:10) gc(); lobstr::mem_used() # memory leak
#> 90,066,024 B

^{Created on 2021-12-04 by the reprex package (v2.0.1)}

dfalbel mentioned this issue Sep 23, 2021

Leak the fit function environment mlverse/luz#74

Closed

lionel- mentioned this issue Dec 3, 2021

Traverse bytecode objects r-lib/memtools#1

Open

lionel- closed this as completed in d265b14 Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when a generator is created inside a function #36

Memory leak when a generator is created inside a function #36

dfalbel commented Sep 10, 2021 •

edited

Loading

randy3k commented Oct 14, 2021 •

edited

Loading

lionel- commented Oct 15, 2021

lionel- commented Dec 3, 2021

dfalbel commented Dec 3, 2021

lionel- commented Dec 3, 2021

randy3k commented Dec 4, 2021 •

edited

Loading

Memory leak when a generator is created inside a function #36

Memory leak when a generator is created inside a function #36

Comments

dfalbel commented Sep 10, 2021 • edited Loading

randy3k commented Oct 14, 2021 • edited Loading

lionel- commented Oct 15, 2021

lionel- commented Dec 3, 2021

dfalbel commented Dec 3, 2021

lionel- commented Dec 3, 2021

randy3k commented Dec 4, 2021 • edited Loading

dfalbel commented Sep 10, 2021 •

edited

Loading

randy3k commented Oct 14, 2021 •

edited

Loading

randy3k commented Dec 4, 2021 •

edited

Loading