Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swingset): clean up promise c-list entries during vat deletion #10268

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

warner
Copy link
Member

@warner warner commented Oct 12, 2024

Previously, when a vat was terminated, and we delete the promise
c-list entries from its old state, the cleanup code was failing to
decrement the kpid's refcount properly. This resulted in a leak: those
promises could never be retired.

This commit updates the vat cleanup code to add a new phase, named
promises. This executes after exports and imports, but before
kv, and is responsible for both deleting the c-list entries and also
decrementing the refcounts of the corresponding promises. We do this
slowly, like we do exports and imports, because we don't know how many
there might be, and because those promise records might hold
references to other objects (in the resolution data), which could
trigger additional work. However, this work is unlikely to be
significant: the run-queue is usually empty, so these outstanding
promises are probably unresolved, and thus cannot beholding resolution
data.

All promises decided by the dead vat are rejected by the kernel
immediately during vat termination, because those rejections are
visible to userspace in other vats. In contrast, freeing the promise
records is not visible to userspace, just like how freeing imports
or exports are not visible to userspace, so this cleanup is safe to do
at a leisurely pace, rate-limited by runPolicy.allowCleanup.

The docs are updated to reflect the new runPolicy API:

  • budget.promises is new, and respected by slow cleanup
  • work.promises is reported to runPolicy.didCleanup()

I don't intend to add any remediation code: it requires a full
refcount audit to find such promises, and the mainnet kernel has only
ever terminated one vat so far, so I believe there cannot be very many
leaked promises, if any. Once this fix is applied, no new leaks will
occur.

fixes #10261

Copy link

cloudflare-workers-and-pages bot commented Oct 12, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5883856
Status: ✅  Deploy successful!
Preview URL: https://88e926af.agoric-sdk.pages.dev
Branch Preview URL: https://warner-10261-termination-lea.agoric-sdk.pages.dev

View logs

@warner warner marked this pull request as ready for review October 12, 2024 02:20
@warner warner requested a review from a team as a code owner October 12, 2024 02:20
Two tests are updated to exercise the cleanup of promise c-list
entries during vat termination. `terminate.test.js` adds some promises
to the c-list and then checks their refcounts after termination, to
demonstrate that bug #10261 is leaking a refcount when it deletes the
dead vat's c-list entry without also decrementing the
refcount. `slow-termination.test.js` adds a number of promises to the
c-list, and expects the budget-limited cleanup to spend a phase on
promises.

Both tests are marked as failing until the code fix is landed in the
next commit.
Previously, when a vat was terminated, and we delete the promise
c-list entries from its old state, the cleanup code was failing to
decrement the kpid's refcount properly. This resulted in a leak: those
promises could never be retired.

This commit updates the vat cleanup code to add a new phase, named
`promises`. This executes after `exports` and `imports`, but before
`kv`, and is responsible for both deleting the c-list entries and also
decrementing the refcounts of the corresponding promises. We do this
slowly, like we do exports and imports, because we don't know how many
there might be, and because those promise records might hold
references to other objects (in the resolution data), which could
trigger additional work. However, this work is unlikely to be
significant: the run-queue is usually empty, so these outstanding
promises are probably unresolved, and thus cannot beholding resolution
data.

All promises *decided* by the dead vat are rejected by the kernel
immediately during vat termination, because those rejections are
visible to userspace in other vats. In contrast, freeing the promise
records is *not* visible to userspace, just like how freeing imports
or exports are not visible to userspace, so this cleanup is safe to do
at a leisurely pace, rate-limited by `runPolicy.allowCleanup`.

The docs are updated to reflect the new `runPolicy` API:

* `budget.promises` is new, and respected by slow cleanup
* `work.promises` is reported to `runPolicy.didCleanup()`

The 'test.failing' marker was removed from the previously updated
tests.

I don't intend to add any remediation code: it requires a full
refcount audit to find such promises, and the mainnet kernel has only
ever terminated one vat so far, so I believe there cannot be very many
leaked promises, if any. Once this fix is applied, no new leaks will
occur.

fixes #10261
@warner warner force-pushed the warner/10261-termination-leaks-promises branch from b0c4f74 to 5883856 Compare October 13, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vat termination leaks promises
1 participant