Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when a generator is created inside a function #36

Closed
dfalbel opened this issue Sep 10, 2021 · 6 comments
Closed

Memory leak when a generator is created inside a function #36

dfalbel opened this issue Sep 10, 2021 · 6 comments

Comments

@dfalbel
Copy link
Contributor

dfalbel commented Sep 10, 2021

It looks like that the function environment where the generator is created and used is never released, even if the generator itself is not in scope anymore.

Here's a reprex:

g <- function() {
  hello <- sample(1:1e7) # large object to easily verify the leak
  generate_abc <- coro::generator(function() {
    for (x in letters[1:3]) {
      coro::yield(x)
    }
  })
  coro::loop(for(x in generate_abc()) {
    y <- x
  })
}

for (x in 1:10) {
  g()
  gc()
  print(lobstr::mem_used())
}
#> 84,001,952 B
#> 124,300,760 B
#> 164,343,576 B
#> 204,386,400 B
#> 244,429,208 B
#> 284,472,048 B
#> 324,514,856 B
#> 364,557,664 B
#> 404,600,472 B
#> 444,643,344 B
@randy3k
Copy link

randy3k commented Oct 14, 2021

I dug a little bit into the issue, it seems to be related to the implementation of the "get next" code.

gtor <- coro::generator(function() {
  long_vec <- sample(1:1e7) # large object to easily verify the leak
  for (x in long_vec) {
    coro::yield(x)
  }
})

g <- gtor()
# memory leaks when g() is called more than once
g()
#> [1] 1957623
g()
#> [1] 737927
for (i in 1:10) gc(); lobstr::mem_used()
#> 86,089,560 B
g = 0
for (i in 1:10) gc(); lobstr::mem_used()
#> 86,090,328 B

g <- gtor()
# doesn't leak when g() is called only once
g()
#> [1] 117777
for (i in 1:10) gc(); lobstr::mem_used()
#> 85,254,592 B
g = 0
for (i in 1:10) gc(); lobstr::mem_used()
#> 45,469,496 B

@lionel-
Copy link
Member

lionel- commented Oct 15, 2021

Thanks for investigating. I'll look into a quick coro release after rlang 1.0 is out.

@lionel-
Copy link
Member

lionel- commented Dec 3, 2021

The leak occurred through an environment inlined in the body of generator instances. When the JIT compiles a function, it caches the bytecode in a hash table. The inlined environment was included in the constant pool of the bytecode and leaked through that cache. To fix this, we now inline a weak reference to the environment instead.

memtools failed to detect the leak because bytecode objects are not currently traversed. I've opened r-lib/memtools#1 to track this.

@dfalbel
Copy link
Contributor Author

dfalbel commented Dec 3, 2021

Thanks @lionel- ! That was tricky, I think I would never be able to figure this out :)

@lionel-
Copy link
Member

lionel- commented Dec 3, 2021

No worries @dfalbel
coro 1.0.2 is on CRAN. Sorry for the delay in treating this issue!

@randy3k
Copy link

randy3k commented Dec 4, 2021

Thanks. Just learn something today.

f <- function() {invisible(NULL)}
# put a large vector inline
body(f)[[2]][[2]] <- sample(1e7)

# run f two times to tigger JIT compilation
f()
f()

for (i in 1:10) gc(); lobstr::mem_used()
#> 89,809,128 B
rm(f)
for (i in 1:10) gc(); lobstr::mem_used() # memory leak
#> 90,066,024 B

Created on 2021-12-04 by the reprex package (v2.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants