Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocol: block safety index #70

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

protolambda
Copy link
Contributor

Description

Proposal to enshrine the idea of a "block safety index", such that other (existing and new) features can be built against this in a unified way.

In particular, as we enter Holocene steady-batch-derivation work, and Interop devnet 2, I believe there is a need for a shared block-safety-index feature, to reduce overall protocol complexity by unifying solutions to the same problem.

Copy link
Member

@sebastianst sebastianst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great proposal!

One aspect I don't understand is the advantage of tracking additional info like span batch bounds, on top of just the L2/L1 derivation mapping.

Thankfully with strict batch ordering we won't have multiple buffered channels any more, but before Holocene, would we also want to track additional derivation pipeline state, like what L1s caused buffered frames/channels in the channel bank?

where a span-batch may generate pending blocks that can be reorged out
if later content of the span-batch is found invalid.

This is changing with Holocene (steady batch derivation, aka strict ordering): we do not want the complexity of having to revert data that was tentatively accepted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you agree that the span batch recovery problem gets easier with Partial Span Batch Validity? If we only forward-invalidate, but not backward-invalidate in a span batch, the start of the span batch is less important as it won't be needed to reorg out on a next invalid batch.

Comment on lines +71 to +75
we currently cannot be sure it was derived from a certain L1; we have to recreate the derivation state to verify.
And if the tip was part of a span-batch, we need to find the start of said span-batch.
So while we can do away with the multi-block pending-safe reorg, we still have to "find" the start of a span-batch.
If we had an index of L2-block to L1-derived-from, with span-batch bounds info,
then finding that start would be much faster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to recreate the derivation state on a reorg or restart anyways? What's the difference to, say, a channel full of singular batches, where the reorg or node restart may have happened in the middle of deriving singular batches from the channel, so the L2 tip is in the middle of a channel? What I mean is that we always have to recreate the derivation state in a way that we get clarity on which L1 block a certain L2 block is derived from, and this will get easier with the set of changes we're introducing with Holocene.

This is not to say that the block safety index is still very useful.

This prevents these chains from entering a divergent hardfork path:
very important to keep the code-base unified and tech-debt low.

The main remaining question is how we bootstrap the data:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For how long do we actually need to maintain this db? Is there an expiry duration that satisfies all above cases, e.g. the FP challenge window? For most mentioned cases I think a soft rollout could work good enough.

we need to roll it out gradually.

This feature could either be a "soft-rollout" type of thing,
or a Holocene feature (if it turns out to be tightly integrated into the steady batch dervation work).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we need to make it a Holocene feature. Holocene still works, like the pre-Holocene protocol, with a sync start protocol (that will actually get easier). But it's a helpful feature the moment it's there.

at 32 bytes per L1 and L2 block hash, some L1 metadata, some L2 metadata, and both local-safe cross-safe,
each entry may about 200 bytes. It does not compress well, as it largely contains blockhash data.

Storing a week of this data, at 2 second blocks, would thus be `7 * 24 * 60 * 60 / 2 * 200 = 60,480,000`, or about 60 MB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what use case would we need more than ~a week of safe db history?

Comment on lines +194 to +195
Storing a week of this data, at 2 second blocks, would thus be `7 * 24 * 60 * 60 / 2 * 200 = 60,480,000`, or about 60 MB.
Storing a year of this data would be around ~3 GB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the average node of a singular chain, I imagine that not more than a few hours of data would ever be needed to support restarts. Maybe 12 hours to support a sequencer window lapse, but I'm not sure if safety indexing is needed when you simply don't have valid blocks over a large range.

For nodes which participate in fault proofs, I imagine only 3.5 days of data is needed, except in cases where a game starts, then up to 7 days is required. Maybe we could incorporate some holding mechanism in the case of open disputes, and be aggressive otherwise.

For nodes of an interoperating chain, they theoretically need all possible chain-safety state in order to respond to Executing Message validity. However, in these cases the nodes should use an op-supervisor anyway.

All this is to say, nodes in different environments may have different retention needs, but they should always be pretty well restrained. Unless I'm missing some use cases? What node would want a year of data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explorers / indexers might want extended data, to show past L1<>L2 relations. But yes, 1 week should otherwise be enough. Perhaps we can add an archive-flag, where it writes the data that gets pruned to a separate DB, for archival purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With clock extension games can be in progress for longer than a week (16 days I think is worst case but may have that wrong) and you typically want to monitor them for some period after they resolve so you have a chance to detect if they resolved incorrectly. Currently dispute-mon and challenger have a default window of 28 days that they monitor. So I'd suggest we want to keep data for at least that long by default.

Comment on lines +128 to +129
On top of that: the finality deriver sub-system of the op-node can potentially be completely removed,
if finality is checked through the op-supervisor API.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still the case where you might run a monochain and not want the sequencer. In that situation, you'd still want some sort of finality deriver.

I'm also thinking about Alt-DA finality, where the finality is pointed at the L1 block where the commitment's challenge is over. For that we'd want a slightly more flexible representation of finality, and I think the safety index serves that well, where the Supervisor would not be suitable.

@axelKingsley
Copy link

This is not really a blocker for the design itself, but worth noting a chore we should do: in various places in our docs, we talk about the statelessness of our components. That statelessness is much weaker already with things like the SafeDB, but this would definitely push us far enough that we'd want to amend documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

4 participants