Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libexpr: Fix use-after-free of StaticEnv::up #12544

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

xokdvium
Copy link
Contributor

Motivation

#12527
#11286

Context

I'm not very familiar with the ownership model here, so there might be some footguns with this solution. Please take a look at my reasoning in the commit message.


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added with-tests Issues related to testing. PRs with tests have some priority repl The Read Eval Print Loop, "nix repl" command and debugger labels Feb 21, 2025
@kjeremy
Copy link
Contributor

kjeremy commented Feb 21, 2025

This may fix the crash part of #12527 but does it fix the performance?

@xokdvium
Copy link
Contributor Author

I was focused on fixing the correctness and getting rid of memory crimes.

Performance will definitely not improve with this one patch. At the very least it should not regress because refcount increments are way cheaper than allocating a new shared_ptr.

It's not very clear what the ownership model is here, but one thing
is certain: `.up` can't be destroyed before the StaticEnv that refers
to it is.

Changing a non-owning pointer to taking shared ownership of the parent
`StaticEnv` prevents the `.up` from being freed.

I'm not a huge fan of the inverted ownership, where child `StaticEnv`
takes a refcount of the parent, but this seems like the least intrusive
way to fix the use-after-free.

This shouldn't cause any shared_ptr cycles to appear (hopefully).
This is the simplest reproducer I have. It would be great to find
a repro without flakes, but I guess this should be ok for now.
@xokdvium xokdvium force-pushed the debugger-use-after-free branch from 0a4b6e5 to 0d50045 Compare February 21, 2025 14:48
@edolstra
Copy link
Member

Prior to 21071bf StaticEnvs were allocated on the stack and they only had to outlive the recursive bindVars() calls, so there was no risk of use-after-free. I'm not sure why the debugger needs them to be dynamically allocated.

@xokdvium
Copy link
Contributor Author

so there was no risk of use-after-free.

But wouldn't that have lead to an access to a still invalid StaticEnv object that used to live in a stack-frame? Use-after-free in this case is just a symptom of a dangling pointer.

I stepped through the debugger trying to pinpoint why the up StaticEnv gets destroyed before it stops being referred to, but I only got a vague understanding.

ASAN log here #12527 (comment) shows where the lifetime started and ended, but that doesn't really help understand why up is dangling.

@Ericson2314
Copy link
Member

I'm not sure why the debugger needs them to be dynamically allocated.

@xokdvium pointed me to the PR description of #5416

Normally StaticEnvs are thrown away after bindVars, so if there's an exception, there's no way to know what symbols are in scope.

[...]

This PR stores each StaticEnv in a shared_ptr instead of as a stack variable. In debug mode, the shared_ptrs are in turn stored in the Exprs. Then when an exception occurs in an Expr, it can refer to its StaticEnv to see what symbols are in scope.

@xokdvium
Copy link
Contributor Author

From #5416:

During nix script execution, Values are stored in Env structs, and nix Exprs know the offsets needed to find these values. Those offsets originate in StaticEnvs during the bindVars phase, before execution happens. StaticEnvs contain symbols and their offsets.

This PR stores each StaticEnv in a shared_ptr instead of as a stack variable. In debug mode, the shared_ptrs are in turn stored in the Exprs. Then when an exception occurs in an Expr, it can refer to its StaticEnv to see what symbols are in scope.

@edolstra My best guess for why the StaticEnv dies prematurely is that no mapping in exprEnvs exists that shared ownership of the StaticEnv, so at the end of the bindVars newEnv dies, even though some other StaticEnv refers to it by a raw pointer. Hence why this can be reproduced with an empty let.

Sharing ownership of the parent StaticEnv with to its child ensures that it does not die prematurely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repl The Read Eval Print Loop, "nix repl" command and debugger with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants