New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

protocol: block safety index #70

Open

protolambda wants to merge 1 commit into main from block-safety-index

+224 −0

Contributor

protolambda commented Aug 29, 2024

Description

Proposal to enshrine the idea of a "block safety index", such that other (existing and new) features can be built against this in a unified way.

In particular, as we enter Holocene steady-batch-derivation work, and Interop devnet 2, I believe there is a need for a shared block-safety-index feature, to reduce overall protocol complexity by unifying solutions to the same problem.


          protocol: block safety index

26dd085

protolambda self-assigned this

protolambda requested review from sebastianst, tynes, axelKingsley and ajsutton

August 29, 2024 17:42

protolambda mentioned this pull request

Interop: substitute finalizer deriver component with supervisor backend ethereum-optimism/optimism#11672

Closed

sebastianst reviewed

View reviewed changes

Member

sebastianst left a comment

Great proposal!

One aspect I don't understand is the advantage of tracking additional info like span batch bounds, on top of just the L2/L1 derivation mapping.

Thankfully with strict batch ordering we won't have multiple buffered channels any more, but before Holocene, would we also want to track additional derivation pipeline state, like what L1s caused buffered frames/channels in the channel bank?

protocol/block-safety-index.md

+              where a span-batch may generate pending blocks that can be reorged out
+              if later content of the span-batch is found invalid.
+              This is changing with Holocene (steady batch derivation, aka strict ordering): we do not want the complexity of having to revert data that was tentatively accepted.

Member

sebastianst Aug 29, 2024

So do you agree that the span batch recovery problem gets easier with Partial Span Batch Validity? If we only forward-invalidate, but not backward-invalidate in a span batch, the start of the span batch is less important as it won't be needed to reorg out on a next invalid batch.

protocol/block-safety-index.md

Comment on lines +71 to +75

+              we currently cannot be sure it was derived from a certain L1; we have to recreate the derivation state to verify.
+              And if the tip was part of a span-batch, we need to find the start of said span-batch.
+              So while we can do away with the multi-block pending-safe reorg, we still have to "find" the start of a span-batch.
+              If we had an index of L2-block to L1-derived-from, with span-batch bounds info,
+              then finding that start would be much faster.

Member

sebastianst Aug 29, 2024

Don't we need to recreate the derivation state on a reorg or restart anyways? What's the difference to, say, a channel full of singular batches, where the reorg or node restart may have happened in the middle of deriving singular batches from the channel, so the L2 tip is in the middle of a channel? What I mean is that we always have to recreate the derivation state in a way that we get clarity on which L1 block a certain L2 block is derived from, and this will get easier with the set of changes we're introducing with Holocene.

This is not to say that the block safety index is still very useful.

protocol/block-safety-index.md

+              This prevents these chains from entering a divergent hardfork path:
+              very important to keep the code-base unified and tech-debt low.
+              The main remaining question is how we bootstrap the data:

Member

sebastianst Aug 29, 2024

For how long do we actually need to maintain this db? Is there an expiry duration that satisfies all above cases, e.g. the FP challenge window? For most mentioned cases I think a soft rollout could work good enough.

protocol/block-safety-index.md

+                we need to roll it out gradually.
+              This feature could either be a "soft-rollout" type of thing,
+              or a Holocene feature (if it turns out to be tightly integrated into the steady batch dervation work).

Member

sebastianst Aug 29, 2024

I don't think that we need to make it a Holocene feature. Holocene still works, like the pre-Holocene protocol, with a sync start protocol (that will actually get easier). But it's a helpful feature the moment it's there.

protocol/block-safety-index.md

+              at 32 bytes per L1 and L2 block hash, some L1 metadata, some L2 metadata, and both local-safe cross-safe,
+              each entry may about 200 bytes. It does not compress well, as it largely contains blockhash data.
+              Storing a week of this data, at 2 second blocks, would thus be `7 * 24 * 60 * 60 / 2 * 200 = 60,480,000`, or about 60 MB.

Member

sebastianst Aug 29, 2024

For what use case would we need more than ~a week of safe db history?

axelKingsley reviewed

View reviewed changes

protocol/block-safety-index.md

Comment on lines +194 to +195

		Storing a week of this data, at 2 second blocks, would thus be `7 * 24 * 60 * 60 / 2 * 200 = 60,480,000`, or about 60 MB.
		Storing a year of this data would be around ~3 GB.

axelKingsley Aug 29, 2024

For the average node of a singular chain, I imagine that not more than a few hours of data would ever be needed to support restarts. Maybe 12 hours to support a sequencer window lapse, but I'm not sure if safety indexing is needed when you simply don't have valid blocks over a large range.

For nodes which participate in fault proofs, I imagine only 3.5 days of data is needed, except in cases where a game starts, then up to 7 days is required. Maybe we could incorporate some holding mechanism in the case of open disputes, and be aggressive otherwise.

For nodes of an interoperating chain, they theoretically need all possible chain-safety state in order to respond to Executing Message validity. However, in these cases the nodes should use an op-supervisor anyway.

All this is to say, nodes in different environments may have different retention needs, but they should always be pretty well restrained. Unless I'm missing some use cases? What node would want a year of data?

Contributor Author

protolambda Aug 30, 2024

Explorers / indexers might want extended data, to show past L1<>L2 relations. But yes, 1 week should otherwise be enough. Perhaps we can add an archive-flag, where it writes the data that gets pruned to a separate DB, for archival purposes.

Contributor

ajsutton Sep 1, 2024

With clock extension games can be in progress for longer than a week (16 days I think is worst case but may have that wrong) and you typically want to monitor them for some period after they resolve so you have a chance to detect if they resolved incorrectly. Currently dispute-mon and challenger have a default window of 28 days that they monitor. So I'd suggest we want to keep data for at least that long by default.

axelKingsley reviewed

View reviewed changes

protocol/block-safety-index.md

Comment on lines +128 to +129

		On top of that: the finality deriver sub-system of the op-node can potentially be completely removed,
		if finality is checked through the op-supervisor API.

axelKingsley Aug 30, 2024

There's still the case where you might run a monochain and not want the sequencer. In that situation, you'd still want some sort of finality deriver.

I'm also thinking about Alt-DA finality, where the finality is pointed at the L1 block where the commitment's challenge is over. For that we'd want a slightly more flexible representation of finality, and I think the safety index serves that well, where the Supervisor would not be suitable.

axelKingsley commented Aug 30, 2024

This is not really a blocker for the design itself, but worth noting a chore we should do: in various places in our docs, we talk about the statelessness of our components. That statelessness is much weaker already with things like the SafeDB, but this would definitely push us far enough that we'd want to amend documentation.

protolambda mentioned this pull request

Interop: check reorgs ethereum-optimism/optimism#11693

Open

This was referenced Sep 21, 2024

Interop: op-supervisor devnet 1 readiness ethereum-optimism/optimism#12042

Closed

Interop: persist index of cross-safe and local-safe increments per L1 block ethereum-optimism/optimism#12076

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet