Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swingset): clean up promise c-list entries during vat deletion #10268

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions packages/SwingSet/docs/run-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,16 +68,19 @@ Some vats may grow very large (i.e. large c-lists with lots of imported/exported

To protect the system against these bursts, the run policy can be configured to terminate vats slowly. Instead of doing all the cleanup work immediately, the policy allows the kernel to do a little bit of work each time `controller.run()` is called (e.g. once per block, for kernels hosted inside a blockchain). Internally, before servicing the run-queue, the kernel checks to see if any vats are in the "terminated but not fully deleted" state, and executes a "vat-cleanup crank", to delete some state. Depending upon what the run-policy allows, it may do multiple vat-cleanup cranks in a single `controller.run()`, or just one, or none at all. And depending upon the budget provided to each one, it may only take one vat-cleanup crank to finish the job, or millions. If the policy limits the number of cranks in a single block, and limits the budget of the crank, then the cleanup process will be spread over multiple blocks.

For each terminated vat, cleanup proceeds through five phases:
For each terminated vat, cleanup proceeds through six phases:

* `exports`: delete c-list entries for objects/promises *exported* by the vat
* `imports`: same but for objects/promises *imported* by the vat
* `exports`: delete c-list entries for objects *exported* by the vat
* `imports`: same but for objects *imported* by the vat
* `promises`: same but promises referenced by the vat
* `kv`: delete all other kv entries for this vat, mostly vatstore records
* `snapshots`: delete xsnap heap snapshots, typically one per 200 deliveries (`snapshotInterval`)
* `transcripts`: delete transcript spans, and their associated transcript items

The first two phases, `exports` and `imports`, cause the most activity in other vats. Deleting `exports` can cause objects to be retired, which will deliver `dispatch.retireImports` GC messages to other vats which have imported those objects and used them as keys in a WeakMapStore. Deleting `imports` can cause refcounts to drop to zero, delivering `dispatch.dropImports` into vats which were exporting those objects. Both of these will add `gcKref` "dirt" to the other vat, eventually triggering a BringOutYourDead, which will cause more DB activity. These are generally the phases we must rate-limit to avoid overwhelming the system.

The `promises` phase may cause kernel promise table entries to be deleted. This is unlikely to trigger activity in other vats, because only settled promises can have references to other objects, and the notification of settlement is enqueued to all subscribers as soon as the promise is fulfilled or rejected. However, if some of these notifications were still in the run-queue when the vat got deleted, the GC work might be accelerated slightly.

The other phases cause DB activity (deleting rows), but do not interact with other vats, so it is easier to accomodate more cleanup steps. The budget can be tuned to allow more kv/snapshots/transcripts than exports/imports in a single cleanup run.

There are two RunPolicy methods which control this. The first is `runPolicy.allowCleanup()`. This will be invoked many times during `controller.run()`, each time the kernel tries to decide what to do next (once per step). The return value will enable (or not) a fixed amount of cleanup work. The second is `runPolicy.didCleanup({ cleanups })`, which is called later, to inform the policy of how much cleanup work was actually done. The policy can count the cleanups and switch `allowCleanup()` to return `false` when it reaches a threshold. (We need the pre-check `allowCleanup` method because the simple act of looking for cleanup work is itself a cost that we might not be willing to pay).
Expand All @@ -92,7 +95,7 @@ The limit can be set to `Infinity` to allow unlimited deletion of that particula

Each budget record must include a `{ default }` property, which is used as the default for any phase that is not explicitly mentioned in the budget. This also provides forwards-compatibility for any phases that might be added in the future. So `budget = { default: 5 }` would provide a conservative budget for cleanup, `budget = { default: 5, kv: 50 }` would enable faster deletion of the non-c-list kvstore entries, and `budget = { default: Infinity }` allows unlimited cleanup for all phases.

Note that the cleanup crank ends when no more work is left to be done (which finally allows the vat to be forgotten entirely), or when an individual phase's budget is exceeded. That means multiple phases might see deletion work in a single crank, if the earlier phase finishes its work without exhausting its budget. For example, if the budget is `{ default: 5 }`, but the vat had 4 exports, 4 imports, 4 other kv entries, 4 snapshots, and 4 transcript spans, then all that work would be done in a single crank, because no individual phase would exhaust its budget. The only case that is even worth mentioning is when the end of the `exports` phase overlaps with the start of the `imports` phase, where we might do four more cleanups than usual.
Note that the cleanup crank ends when no more work is left to be done (which finally allows the vat to be forgotten entirely), or when an individual phase's budget is exceeded. That means multiple phases might see deletion work in a single crank, if the earlier phase finishes its work without exhausting its budget. For example, if the budget is `{ default: 5 }`, but the vat had 4 exports, 4 imports, 4 promise entries, 4 other kv entries, 4 snapshots, and 4 transcript spans, then all that work would be done in a single crank, because no individual phase would exhaust its budget. The only case that is even worth mentioning is when the end of the `exports` phase overlaps with the start of the `imports` phase, where we might do four more cleanups than usual.

A `true` return value from `allowCleanup()` is equivalent to `{ default: Infinity }`, which allows unlimited cleanup work. This also happens if `allowCleanup()` is missing entirely, which maintains the old behavior for host applications that haven't been updated to make new policy objects. Note that cleanup is higher priority than any delivery, and is second only to acceptance queue routing.

Expand Down
2 changes: 2 additions & 0 deletions packages/SwingSet/src/kernel/kernel.js
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,7 @@ export default function buildKernel(
total:
work.exports +
work.imports +
work.promises +
work.kv +
work.snapshots +
work.transcripts,
Expand Down Expand Up @@ -1820,6 +1821,7 @@ export default function buildKernel(
{
exports: M.number(),
imports: M.number(),
promises: M.number(),
kv: M.number(),
snapshots: M.number(),
transcripts: M.number(),
Expand Down
20 changes: 18 additions & 2 deletions packages/SwingSet/src/kernel/state/kernelKeeper.js
Original file line number Diff line number Diff line change
Expand Up @@ -967,6 +967,7 @@ export default function makeKernelKeeper(
const work = {
exports: 0,
imports: 0,
promises: 0,
kv: 0,
snapshots: 0,
transcripts: 0,
Expand All @@ -991,6 +992,7 @@ export default function makeKernelKeeper(
const clistPrefix = `${vatID}.c.`;
const exportPrefix = `${clistPrefix}o+`;
const importPrefix = `${clistPrefix}o-`;
const promisePrefix = `${clistPrefix}p`;

// Note: ASCII order is "+,-./", and we rely upon this to split the
// keyspace into the various o+NN/o-NN/etc spaces. If we were using a
Expand Down Expand Up @@ -1043,8 +1045,22 @@ export default function makeKernelKeeper(
}
}

// the caller used enumeratePromisesByDecider() before calling us,
// so they already know the orphaned promises to reject
// The caller used enumeratePromisesByDecider() before calling us,
// so they have already rejected the orphan promises, but those
// kpids are still present in the dead vat's c-list. Clean those
// up now.
remaining = budget.promises ?? budget.default;
for (const k of enumeratePrefixedKeys(kvStore, promisePrefix)) {
const kref = kvStore.get(k) || Fail`getNextKey ensures get`;
const vref = stripPrefix(clistPrefix, k);
vatKeeper.deleteCListEntry(kref, vref);
// that will also delete both db keys
work.promises += 1;
remaining -= 1;
if (remaining <= 0) {
return { done: false, work };
}
}

// now loop back through everything and delete it all
remaining = budget.kv ?? budget.default;
Expand Down
1 change: 1 addition & 0 deletions packages/SwingSet/src/types-external.js
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ export {};
*
* @typedef { { exports: number,
* imports: number,
* promises: number,
* kv: number,
* snapshots: number,
* transcripts: number,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { Far, E } from '@endo/far';
import { makePromiseKit } from '@endo/promise-kit';

export function buildRootObject(_vatPowers) {
let root;
Expand All @@ -21,11 +22,22 @@ export function buildRootObject(_vatPowers) {
}

// set up 20 "dude exports, bootstrap imports" c-list entries

for (let i = 0; i < 20; i += 1) {
myImports.push(await E(root).sendExport());
}

// also 10 imported promises
for (let i = 0; i < 10; i += 1) {
await E(root).acceptImports(makePromiseKit().promise);
}

// and 10 exported promises
for (let i = 0; i < 10; i += 1) {
const p = E(root).forever();
myImports.push(p);
p.catch(_err => 0); // hush
}

// ask dude to creates 20 vatstore entries (in addition to the
// built-in liveslots stuff)
await E(root).makeVatstore(20);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ async function doSlowTerminate(t, mode) {
defaultManagerType: 'xsnap',
defaultReapInterval: 'never',
snapshotInitial: 2, // same as the default
snapshotInterval: 10, // ensure multiple spans+snapshots
snapshotInterval: 11, // ensure multiple spans+snapshots
bootstrap: 'bootstrap',
bundles: {
dude: {
Expand Down Expand Up @@ -117,13 +117,16 @@ async function doSlowTerminate(t, mode) {
t.is(countCleanups, 0);

// bootstrap adds a fair amount of vat-dude state:
// * we have c-list entries for 20 imports and 20 exports, each of
// which need two kvStore entries, so 80 kvStore total
// * we have c-list entries for 20 object imports and 20 object
// exports, each of which need two kvStore entries, so 80 kvStore
// total
// * c-list entries for 10 promise imports and 10 promise exports,
// * so 40 kvStore total
// * the vat has created 20 baggage entries, all of which go into
// the vatstore, adding 20 kvStore
// * an empty vat has about 29 kvStore entries just to track
// counters, the built-in collection types, baggage itself, etc
// * by sending 40-plus deliveries into an xsnap vat with
// * by sending 60-plus deliveries into an xsnap vat with
// snapInterval=5, we get 8-ish transcript spans (7 old, 1
// current), and each old span generates a heap snapshot record
// Slow vat termination means deleting these entries slowly.
Expand All @@ -148,30 +151,47 @@ async function doSlowTerminate(t, mode) {
.pluck()
.get(vatID);

// 20*2 for imports, 21*2 for exports, 20*1 for vatstore = 102.
// Plus 21 for the usual liveslots stuff, and 6 for kernel stuff
// like vNN.source/options
// 20*2 for imports, 21*2 for exports, 20*2 for promises, 20*1 for
// vatstore = 142. Plus 21 for the usual liveslots stuff, and 6 for
// kernel stuff like vNN.source/options

t.is(remainingKV().length, 129);
t.is(remainingKV().length, 169);
t.false(JSON.parse(kvStore.get('vats.terminated')).includes(vatID));

// we get one span for snapshotInitial (=2), then a span every
// snapshotInterval (=10). Each non-current span creates a
// snapshotInterval (=11). Each non-current span creates a
// snapshot.
t.is(remainingTranscriptSpans(), 6);
t.is(remainingTranscriptItems(), 59);
t.is(remainingSnapshots(), 5);
let expectedRemainingItems = 85;
let expectedRemainingSpans = 8;
let expectedRemainingSnapshots = 7;

const checkTS = () => {
t.is(remainingTranscriptSpans(), expectedRemainingSpans);
t.is(remainingTranscriptItems(), expectedRemainingItems);
t.is(remainingSnapshots(), expectedRemainingSnapshots);
};
checkTS();

const remaining = () =>
remainingKV().length +
remainingSnapshots() +
remainingTranscriptItems() +
remainingTranscriptSpans();

// note: mode=dieHappy means we send one extra message to the vat,
// which adds a single transcript item (but this doesn't happen to trigger an extra span)
// but we've tuned snapshotInterval to avoid this triggering a BOYD
// and save/load snapshot, so it increases our expected transcript
// items, but leaves the spans/snapshots alone
if (mode === 'dieHappy') {
expectedRemainingItems += 1;
}

const kpid = controller.queueToVatRoot('bootstrap', 'kill', [mode]);
await controller.run(noCleanupPolicy);
await commit();

checkTS();

t.is(controller.kpStatus(kpid), 'fulfilled');
t.deepEqual(
controller.kpResolution(kpid),
Expand All @@ -182,7 +202,7 @@ async function doSlowTerminate(t, mode) {
t.true(JSON.parse(kvStore.get('vats.terminated')).includes(vatID));
// no cleanups were allowed, so nothing should be removed yet
t.truthy(kernelStorage.kvStore.get(`${vatID}.options`));
t.is(remainingKV().length, 129);
t.is(remainingKV().length, 169);

// now do a series of cleanup runs, each with budget=5
const clean = async (budget = 5) => {
Expand Down Expand Up @@ -223,8 +243,16 @@ async function doSlowTerminate(t, mode) {
// the non-clist kvstore keys should still be present
t.truthy(kernelStorage.kvStore.get(`${vatID}.options`));

// there are no imports, so this clean(budget.default=5) will delete
// the first five of our 47 other kv entries (20 vatstore plus 27
// there are no remaining imports, so this clean(budget.default=5)
// will delete the first five of our promise c-list entries, each
// with two kv entries
await cleanKV(10, 5); // 5 c-list promises
await cleanKV(10, 5); // 5 c-list promises
await cleanKV(10, 5); // 5 c-list promises
await cleanKV(10, 5); // 5 c-list promises

// that finishes the promises, so the next clean will delete the
// first five of our 47 other kv entries (20 vatstore plus 27
// liveslots overhead

await cleanKV(5, 5); // 5 other kv
Expand All @@ -241,46 +269,47 @@ async function doSlowTerminate(t, mode) {
t.is(remainingKV().length, 22);
await cleanKV(5, 5); // 5 other kv
t.is(remainingKV().length, 17);
t.is(remainingSnapshots(), 5);
checkTS();
await cleanKV(5, 5); // 5 other kv
t.is(remainingKV().length, 12);
await cleanKV(5, 5); // 5 other kv
t.is(remainingKV().length, 7);
await cleanKV(5, 5); // 5 other kv
t.is(remainingKV().length, 2);
t.is(remainingSnapshots(), 5);

// there are two kv left, so this clean will delete those, then all 5 snapshots
checkTS();

// there are two kv left, so this clean will delete those, then 5 of
// the 7 snapshots
await cleanKV(2, 7); // 2 final kv, and 5 snapshots
t.deepEqual(remainingKV(), []);
t.is(kernelStorage.kvStore.get(`${vatID}.options`), undefined);
t.is(remainingSnapshots(), 0);

t.is(remainingTranscriptSpans(), 6);
let ts = 59;
if (mode === 'dieHappy') {
ts = 60;
}
t.is(remainingTranscriptItems(), ts);

// the next clean (with the default budget of 5) will delete the
// five most recent transcript spans, starting with the isCurrent=1
// one (which had 9 or 10 items), leaving just the earliest (which
// had 4, due to `snapshotInitial`)
expectedRemainingSnapshots -= 5;
checkTS();

// the next clean gets the 2 remaining snapshots, and the five most
// recent transcript spans, starting with the isCurrent=1 one (which
// had 9 or 10 items), leaving the earliest (which had 4, due to
// `snapshotInitial`) and the next two (with 13 each, due to
// snapshotInterval plus 2 for BOYD/snapshot overhead ).
let cleanups = await clean();
t.is(cleanups, 5);
t.is(remainingTranscriptSpans(), 1);
t.is(remainingTranscriptItems(), 4);

t.is(cleanups, expectedRemainingSnapshots + 5);
expectedRemainingSnapshots = 0;
expectedRemainingItems = 4 + 13 + 13;
expectedRemainingSpans = 3;
checkTS();
// not quite done
t.true(JSON.parse(kvStore.get('vats.terminated')).includes(vatID));

// the final clean deletes the remaining span, and finishes by
// the final clean deletes the remaining spans, and finishes by
// removing the "still being deleted" bookkeeping, and the .options

cleanups = await clean();
t.is(cleanups, 1);
t.is(remainingTranscriptSpans(), 0);
t.is(remainingTranscriptItems(), 0);
t.is(cleanups, 3);
expectedRemainingItems = 0;
expectedRemainingSpans = 0;
checkTS();
t.is(remaining(), 0);
t.false(JSON.parse(kvStore.get('vats.terminated')).includes(vatID));
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { Far } from '@endo/far';
import { makePromiseKit } from '@endo/promise-kit';

export function buildRootObject(vatPowers, _vatParameters, baggage) {
const hold = [];
Expand All @@ -7,6 +8,7 @@ export function buildRootObject(vatPowers, _vatParameters, baggage) {
dieHappy: completion => vatPowers.exitVat(completion),
sendExport: () => Far('dude export', {}),
acceptImports: imports => hold.push(imports),
forever: () => makePromiseKit().promise,
makeVatstore: count => {
for (let i = 0; i < count; i += 1) {
baggage.init(`key-${i}`, i);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,24 @@
import { Far, E } from '@endo/far';
import { makePromiseKit } from '@endo/promise-kit';

export function buildRootObject() {
let dude;
const hold = [];

const self = Far('root', {
async bootstrap(vats, devices) {
const vatMaker = E(vats.vatAdmin).createVatAdminService(devices.vatAdmin);

// create a dynamic vat, send it a message and let it respond, to make
// sure everything is working
// Create a dynamic vat, send it a message and let it respond,
// to make sure everything is working. Give them a promise to
// follow, to check that its refcount is cleaned up.
dude = await E(vatMaker).createVatByName('dude');
await E(dude.root).foo(1);
const p = makePromiseKit().promise;
await E(dude.root).holdPromise(p);
const p2 = E(dude.root).never();
p2.catch(_err => 'hush');
hold.push(p2);
return 'bootstrap done';
},
async phase2() {
Expand Down
27 changes: 27 additions & 0 deletions packages/SwingSet/test/vat-admin/terminate/terminate.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -471,6 +471,23 @@ test.serial('dead vat state removed', async t => {
t.is(kvStore.get('vat.dynamicIDs'), '["v6"]');
t.is(kvStore.get('ko26.owner'), 'v6');
t.is(Array.from(enumeratePrefixedKeys(kvStore, 'v6.')).length > 10, true);
// find the two promises in the new vat's c-list, they should both
// have refCount=2 (one for bootstrap, one for the dynamic vat)
let decidedKPID;
let subscribedKPID;
for (const key of enumeratePrefixedKeys(kvStore, 'v6.')) {
if (key.startsWith('v6.c.kp')) {
const kpid = key.slice('v6.c.'.length);
const refCount = Number(kvStore.get(`${kpid}.refCount`));
const decider = kvStore.get(`${kpid}.decider`);
if (decider === 'v6') {
decidedKPID = kpid;
} else {
subscribedKPID = kpid;
}
t.is(refCount, 2, `${kpid}.refCount=${refCount}, not 2`);
}
}
const beforeDump = debug.dump(true);
t.truthy(beforeDump.transcripts.v6);
t.truthy(beforeDump.snapshots.v6);
Expand All @@ -483,6 +500,16 @@ test.serial('dead vat state removed', async t => {
const afterDump = debug.dump(true);
t.falsy(afterDump.transcripts.v6);
t.falsy(afterDump.snapshots.v6);
// the promise that v6 was deciding will be removed from v6's
// c-list, rejected by the kernel, and the reject notification to
// vat-bootstrap will remove it from that c-list, so the end state
// should be no references, and a completely deleted kernel promise
// table entry
t.is(kvStore.get(`${decidedKPID}.refCount`), undefined);
// the promise that v6 received from vat-bootstrap should be removed
// from v6's c-list, but it is still being exported by
// vat-bootstrap, so the refcount should finish at 1
t.is(kvStore.get(`${subscribedKPID}.refCount`), '1');
});

test.serial('terminate with presence', async t => {
Expand Down
Loading
Loading