[Dynamic Protocol State] `EpochStateContainer` stores epoch active identities #4834

durkmurder · 2023-10-18T14:56:26Z

Context

This PR is an attempt to change ActiveIdentities to store only participants that can contribute to extending the chain instead of storing participants for current and prev/next epoch depending on phase.

Previously ActiveIdentities were storing current + prev epoch participants in staking phase and current + next in setup/commit phases. By storing identities for previous epoch it allows us to store only identities that were introduced in current epoch as part of ActiveIdentities. In nutshell ActiveIdentities holds dynamic part of identities that were introduced in epoch setup event.

I will be updating docs and tests but don't expect any major changes to the implementation itself, so feel free to review.

…dentities to contain only epoch related identities

…ty table

…hub.com/onflow/flow-go into yurii/4649-prev-epoch-refactoring-attempt

• revised goDoc of protocol state implementation • changed `ProtocolStateEntry.PreviousEpoch` to pointer (consistent with `ProtocolStateEntry.NextEpoch`) • There should be no significant algorithmic changes. Though, I have switched the order of the if and else branches when processing an Epoch Setup event. For me, this is much more in line with the intuitive order of documentation. state/protocol/protocol_state/updater.go: • notable updates and revisions of goDoc • tried to address concerns around inconsistent handling of invariances • updated code to work with `ProtocolStateEntry.PreviousEpoch` being potentially a nil pointer storage/badger/protocol_state_test.go: • detailed goDoc revisions for test `assertRichProtocolStateValidity`

AlexHentschel

This greatly improved code readability and mental structure for me. Amazing work. 👏

There are minor some concerns about a couple places in the code. I have created PR #4854 (targeting your branch) that addresses the first of my concerns but not the second one. Furthermore, the PR updated, polishes and extends a lot of goDocs.

storage/badger/protocol_state_test.go

model/flow/protocol_state.go

model/flow/protocol_state_test.go

state/protocol/protocol_state/updater.go

AlexHentschel · 2023-10-20T08:30:30Z

state/protocol/protocol_state/updater.go

@@ -187,17 +150,20 @@ func (u *Updater) ProcessEpochCommit(epochCommit *flow.EpochCommit) error {
 // No errors are expected during normal operations.
 func (u *Updater) UpdateIdentity(updated *flow.DynamicIdentityEntry) error {


⚠️❓

I have a hard time convincing myself that this code is correct. Just as a brief sketch of my thoughts:

~~at the point when we are leaving an epoch, the Identity Table (EpochStateContainer.ActiveIdentities) essentially snapshots the latest state when these nodes were active.~~

~~In my opinion, changes of the dynamic identities essentially should only carry forward (causality) but not backwards.~~

~~Hence, ProtocolStateEntry.PreviousEpoch should never be modified. But we do here:~~

flow-go/state/protocol/protocol_state/updater.go

Lines 153 to 156 in e765750

prevEpochIdentity, foundInPrev := u.prevEpochIdentitiesLookup[updated.NodeID]

if foundInPrev {

prevEpochIdentity.Dynamic = updated.Dynamic

}

changing dynamic properties of node X affects the current and next epoch. However, as the weight might change at epoch boundaries, we cannot simply overwrite the weight for the current and next epoch by the same value. I think the following code is an incorrect simplification:

flow-go/state/protocol/protocol_state/updater.go

Lines 157 to 164 in e765750

currentEpochIdentity, foundInCurrent := u.currentEpochIdentitiesLookup[updated.NodeID]

if foundInCurrent {

currentEpochIdentity.Dynamic = updated.Dynamic

}

nextEpochIdentity, foundInNext := u.nextEpochIdentitiesLookup[updated.NodeID]

if foundInNext {

nextEpochIdentity.Dynamic = updated.Dynamic

}

This is not addressed in my PR

As I understand we will address it in subsequent PR

Update:

We discussed this point and decided that we really should continue to modify the identities also from past epochs.

We are essentially creating a snapshot of the identity table as of a certain block. By modifying an identity, we only carry this change forward.

Furthermore, we need to be able to modify "our current value" for a node X that participated in the last epoch but not in the current. Otherwise, we would not be able to eject X in the current epoch despite it misbehaving.

Conclusion: current approach is correct.

This will be addressed in a subsequent PR: we will replace (dynamic) weight by an enum representing the participation state (joining, active, leaving)

Currently, we maintain a dynamic weight but only differentiate between zero weight (not active) and positive (active).

It is not clear whether the protocol would benefit from dynamically changing trust weight in the future. Reasoning:

For consensus participants (consensus nodes, collector nodes), we anyway use their initial weight. The practical benefits are massive, because this enables light clients that don't need to download and locally check every block header.

Leader selection is also pre-determined for an entire epoch based on initial weight.

For other Verification [VNs] and Execution [ENs] nodes we currently don't have meaningful ways for weight-based load balancing. ENs anyway need to execute every block and our chunk-assignment algorithm generates uniform load across all VNs. It is not clear whether weighted protocol variants even exist (and even if such algorithms exist [open research topic], a large amount of complicated software changes would likely be needed).

Reworked TODO in [Dynamic Protocol State] Remaining work and ToDos #4649 (second item in High priority) to reflect our latest approach to this challenge

…-attempt_-_suggestions

state/protocol/protocol_state/updater.go

jordanschalm · 2023-10-20T16:38:01Z

state/protocol/protocol_state/updater.go

+	//   - Per convention, the system smart contracts should list the IdentitySkeletons in canonical order. This is useful for
+	//     most efficient construction of the full active Identities for an epoch.


The system contract does not list identities in canonical order. Conceptually, the identity table is an unordered set. We assume it is ordered a particular way at various points in the flow-go code to simplify various implementation details, like the DKG.

We enforce that the identity list used within flow-go is ordered by ordering it during the conversion from cadence to Go:

flow-go/model/convert/service_event.go

Line 638 in 51a572b

participants = participants.Sort(order.Canonical)

.

Can we update our smart contract to do this? Either way we would need to work on it in future

I don't think we should have the smart contract do sorting. It's a resource-constrained environment and it is not critical that the set be sorted within the system contract.

It seems fine to me for the protocol layer to translate the event into a representation that is easier for it to work with.

Thanks Jordan for flagging that my comment in the code was incorrect. I have updated it in 49934ce to read:

flow-go/state/protocol/protocol_state/updater.go

Lines 92 to 98 in 49934ce

// sanity checking SAFETY-CRITICAL INVARIANT (II):

// - Per convention, the `flow.EpochSetup` event should list the IdentitySkeletons in canonical order. This is useful

// for most efficient construction of the full active Identities for an epoch. We enforce this here at the gateway

// to the protocol state, when we incorporate new information from the EpochSetup event.

// - Note that the system smart contracts manage the identity table as an unordered set! For the protocol state, we desire a fixed

// ordering to simplify various implementation details, like the DKG. Therefore, we order identities in `flow.EpochSetup` during

// conversion from cadence to Go in the function `convert.ServiceEvent(flow.ChainID, flow.Event)` in package `model/convert`

Thereby, the documentation precisely describes the current state, which I think is all that is needed for this PR (?)

My thoughts on where ordering could be implemented in the future

On the one hand, I see (weak) benefits of not having an ordering requirement in the system smart contracts. Conceptually, a set is sufficient for the current functionality of the system smart contracts. Not requiring an ordering is one less details that engineers have know about and that could accidentally be broken.

For similar reasons, a block proposer is not required to include seals in an ordered manner. Instead, we re-order them when ingesting the block, which improves modularity of the code by removing inter-dependencies.

On the other hand, in the future, the system smart contracts will also need to have the canonical ordering. This is because we want to eventually implement performance-dependent rewards for consensus nodes, which in turn requires a smart contract to decode the signer indices for blocks to determine which node contributed to a QC (the basis for the subsequent reward payout at the end of the epoch). To decode the signer indices, the smart contract needs to know the ordering.

We then have two choices: (i) the system smart contract maintains an un-ordered set and we re-order the identities at every block (a huge waste of computation) or (ii) the system smart maintains the identities in canonical ordering

Btw, note that to decode QC signer identities, we also need a notion of identities for each epoch. In other words, we would no longer be able top the current epoch's identity table at the end of the staking phase 😅

requires a smart contract to decode the signer indices for blocks to determine which node contributed to a QC

Or, we could implement an API on the injected Block Cadence type that does this?

could implement an API on the injected Block Cadence type that does this?

We could. At which point to we change the publisher of the data (system smart contract), vs layering on logic on the consumers (protocol state, dynamic reward's logic)? If we have only one consumer, I don't feel it makes much of a difference for the implementation because the ordering has to live either on the consumer or the publisher -- but putting the ordering into the consumer improves modularity. With two consumers, we are starting to duplicate the ordering logic. To me, that would be the point, where I think it would be more beneficial to put the ordering in the publisher.

Thanks for the insightful discussion on this point. I feel for the scope of this PR, the comment is addressed (?)

state/protocol/protocol_state/updater.go

jordanschalm · 2023-10-20T18:44:55Z

utils/unittest/fixtures.go

-			}
-		}
+		currentEpochParticipants := entry.CurrentEpochIdentityTable.Filter(func(identity *flow.Identity) bool {
+			_, found := entry.CurrentEpochSetup.Participants.ByNodeID(identity.NodeID)


May not matter for a fixture, but ByNodeID does linear search so this is going to be $O(n^2)$

Right, but I feel it's more expressive and it's worth paying the cost in such case(especially in tests)

jordanschalm

Looks good to me, ignoring our discussion about removing Weight altogether from this morning, which we'll address in a separate PR.

The one change I'd like to make sure we include is using sentinels to denote when a service event is invalid: InvalidServiceEventError. That way we can handle that failure path in the upper-level logic later on.

…och-refactoring-attempt

codecov-commenter · 2023-10-23T08:21:36Z

Codecov Report

Attention: 60 lines in your changes are missing coverage. Please review.

Comparison is base (de3c468) 49.69% compared to head (c638874) 55.83%.

Additional details and impacted files

@@                        Coverage Diff                         @@
##           feature/dynamic-protocol-state    #4834      +/-   ##
==================================================================
+ Coverage                           49.69%   55.83%   +6.13%     
==================================================================
  Files                                 480      946     +466     
  Lines                               47751    87792   +40041     
==================================================================
+ Hits                                23729    49017   +25288     
- Misses                              22253    35092   +12839     
- Partials                             1769     3683    +1914

Flag	Coverage Δ
unittests	`55.83% <61.53%> (+6.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
state/protocol/inmem/convert.go	`66.54% <100.00%> (ø)`
storage/badger/protocol_state.go	`66.98% <100.00%> (ø)`
state/protocol/protocol_state/updater.go	`97.72% <93.47%> (ø)`
state/protocol/badger/mutator.go	`66.94% <0.00%> (ø)`
model/flow/protocol_state.go	`48.59% <71.01%> (ø)`
utils/unittest/fixtures.go	`0.00% <0.00%> (ø)`

... and 465 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Jordan Schalm <[email protected]>

…mit. Updated tests, error handling, docs

…ithub.com/onflow/flow-go into yurii/4649-prev-epoch-refactoring-attempt

…ll inputs

… ordering assumptions in the system smart contracts vs the core protocol layer.

AlexHentschel · 2023-10-23T22:02:52Z

state/protocol/badger/mutator.go

+					if protocol.IsInvalidServiceEventError(err) {
+						// we have observed an invalid service event, which triggers epoch fallback mode
+						updater.SetInvalidStateTransitionAttempted()
+						return dbUpdates, nil
+					}


⚠️

I am a little worried about this:
At this point, we don't know that the updater's internal state is a valid Protocol State. For all we know, it could be complete garbage. There is no atomicity requirement on the StateUpdater as far as I can tell, in that it either applies the update entirely or not at all.

Suggestion:

I think it it is ok to treat all protocol state operations mutations in a block as one atomic state transition. It either succeeds in its entirety or in case of a failure, there is no update to the protocol state (except setting InvalidStateTransitionAttempted to true).

I think it is straight forward to extend the protocol state Updater to "drop all modifications if InvalidStateTransitionAttempted" and document this properly throughout the code base. In a nutshell, within the Build method I would check the state.InvalidStateTransitionAttempted first; in that case we could just return a copy of the parentState with only the state.InvalidStateTransitionAttempted set to true.

While the code changes are relatively small, we need to properly test this and add detailed documentation to the Updater, the corresponding interface, and ideally also the places where we call the Updater

Further thoughts:

If the updater is signalling that it encountered an invalid state transition via an InvalidStateTransitionAttempted, should the updater maybe already set its own InvalidStateTransitionAttempted flag? We would still raise the error to signal the failure to the caller. However, the caller would no longer be required to explicitly call updater.SetInvalidStateTransitionAttempted() (I'd leave that method, to allow Updater external logic to set this flag for other reasons).

I think we should not do this as part of the current PR. The PR is already big enough and we keep layering on changes which makes it hard to keep track of without re-reviewing the entire PR. Added todo to #4649

…`Updater.ProcessEpochSetup()` such that the documentation and implementation are in the same order

AlexHentschel

I think this PR is ready to be merged. Great work!

There are a few remaining comments, but I think those should all be addressed in subsequent PRs. I went through the PR comments that are still open and added the respective items as todos to #4649

AlexHentschel · 2023-10-23T22:44:54Z

state/protocol/protocol_state/updater.go

 	if u.state.InvalidStateTransitionAttempted {
-		return nil // won't process new events if we are in EECC
+		return nil // won't process new events if we are in epoch fallback mode.
 	}


regarding these checks:

flow-go/state/protocol/protocol_state/updater.go

Lines 210 to 212 in 49934ce

if u.state.InvalidStateTransitionAttempted {

return fmt.Errorf("invalid state transition has been attempted, no transition is allowed")

}

flow-go/state/protocol/protocol_state/updater.go

Lines 70 to 72 in 49934ce

if u.state.InvalidStateTransitionAttempted {

return nil // won't process new events if we are in epoch fallback mode.

}

flow-go/state/protocol/protocol_state/updater.go

Lines 156 to 158 in 49934ce

if u.state.InvalidStateTransitionAttempted {

return nil // won't process new events if we are going to enter epoch fallback mode

}

I would be inclined to move these check to the very beginning of the respective methods. Otherwise, we would probably continue encountering errors after we once set InvalidStateTransitionAttempted, because later events will not make sense anymore as the protocol state stoped updating.

AlexHentschel · 2023-10-23T22:53:30Z

state/protocol/protocol_state/updater.go

+		// sanity checking invariant (I):
+		currentEpochDynamicProperties, found := activeIdentitiesLookup[nextEpochIdentitySkeleton.NodeID]
+		if found && currentEpochDynamicProperties.Dynamic.Ejected { // invariance violated
+			return protocol.NewInvalidServiceEventErrorf("node %v is ejected in current epoch %d but readmitted by EpochSetup event for epoch %d", nextEpochIdentitySkeleton.NodeID, u.parentState.CurrentEpochSetup.Counter, epochSetup.Counter)
+		}

-	nextEpochIdentities := make(flow.DynamicIdentityEntryList, 0, len(currentEpochIdentities))
-	currentEpochIdentitiesLookup := currentEpochIdentities.Lookup()
-	// For an `identity` participating in the upcoming epoch, we effectively perform steps 2 and 3 from above within a single loop.
-	for _, identity := range epochSetup.Participants {
-		// Step 2: node is _not_ participating in the current epoch, but joining in the upcoming epoch.
-		// The node is allowed to join the network already in this epoch's Setup Phase, but has weight 0.
-		if _, found := currentEpochIdentitiesLookup[identity.NodeID]; !found {
-			currentEpochIdentities = append(currentEpochIdentities, &flow.DynamicIdentityEntry{
-				NodeID: identity.NodeID,
-				Dynamic: flow.DynamicIdentity{
-					Weight:  0,
-					Ejected: false,
-				},
-			})
+		// sanity checking invariant (II):
+		if idx > 0 && !order.IdentifierCanonical(prevNodeID, nextEpochIdentitySkeleton.NodeID) {
+			return protocol.NewInvalidServiceEventErrorf("epoch setup event lists active participants not in canonical ordering")
 		}
+		prevNodeID = nextEpochIdentitySkeleton.NodeID


I took the liberty to reorder these and the respective in-code documentation above such that the documentation and implementation are in the same order 👉 c638874

durkmurder added 4 commits October 18, 2023 08:46

WIP. Updated prev epoch to contain active identities. Updated ActiveI…

ff90836

…dentities to contain only epoch related identities

Updated storage layer assertions and process of reconstructing identi…

d8046b5

…ty table

Changed how state updater processes epoch setup event. Updated tests

641dbca

Changed how identities are applied to protocol state. Fixed some tests

dd04ca4

durkmurder requested review from jordanschalm and AlexHentschel October 18, 2023 14:56

durkmurder assigned jordanschalm and AlexHentschel Oct 18, 2023

Merge branch 'yurii/4649-todos-and-refactoring-part-1' of https://git…

e765750

…hub.com/onflow/flow-go into yurii/4649-prev-epoch-refactoring-attempt

AlexHentschel mentioned this pull request Oct 20, 2023

[Dynamic Protocol State] Changing structure of participants in EpochSetup #4726

Merged

Alexander Hentschel added 2 commits October 20, 2023 01:15

minor comment revision

d9ae36d

AlexHentschel mentioned this pull request Oct 20, 2023

suggestions for PR #4834 #4854

Merged

AlexHentschel reviewed Oct 20, 2023

View reviewed changes

durkmurder added 2 commits October 20, 2023 16:15

Fixed tests

15e3e33

Merge pull request #4854 from onflow/alex/4649-prev-epoch-refactoring…

0dd823a

…-attempt_-_suggestions

jordanschalm reviewed Oct 20, 2023

View reviewed changes

jordanschalm approved these changes Oct 20, 2023

View reviewed changes

Base automatically changed from yurii/4649-todos-and-refactoring-part-1 to feature/dynamic-protocol-state October 23, 2023 08:07

Merge branch 'feature/dynamic-protocol-state' into yurii/4649-prev-ep…

d3aa8d4

…och-refactoring-attempt

durkmurder and others added 8 commits October 23, 2023 12:25

Added test for BuildIdentityTable

95dbf1e

Changed order of arguments in BuildIdentityTable

df2bfab

Linted

25624f2

Apply suggestions from code review

f92636d

Co-authored-by: Jordan Schalm <[email protected]>

Changed expected error types of ProcessEpochSetup and ProcessEpochCom…

0172dd1

…mit. Updated tests, error handling, docs

Merge branch 'yurii/4649-prev-epoch-refactoring-attempt' of https://g…

d51fcfa

…ithub.com/onflow/flow-go into yurii/4649-prev-epoch-refactoring-attempt

updated goDoc of method BuildIdentityTable to completely describe a…

f6d519b

…ll inputs

updated code-internal documentation to precisely explain the identity…

49934ce

… ordering assumptions in the system smart contracts vs the core protocol layer.

AlexHentschel reviewed Oct 23, 2023

View reviewed changes

AlexHentschel mentioned this pull request Oct 23, 2023

[Dynamic Protocol State] Remaining work and ToDos #4649

Closed

19 tasks

minor re-ordering of the logic and in-code documentation of function …

c638874

…`Updater.ProcessEpochSetup()` such that the documentation and implementation are in the same order

AlexHentschel approved these changes Oct 23, 2023

View reviewed changes

durkmurder merged commit e210519 into feature/dynamic-protocol-state Oct 24, 2023
36 checks passed

durkmurder deleted the yurii/4649-prev-epoch-refactoring-attempt branch October 24, 2023 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dynamic Protocol State] `EpochStateContainer` stores epoch active identities #4834

[Dynamic Protocol State] `EpochStateContainer` stores epoch active identities #4834

durkmurder commented Oct 18, 2023

AlexHentschel left a comment •

edited

Loading

AlexHentschel Oct 20, 2023 •

edited

Loading

durkmurder Oct 23, 2023

AlexHentschel Oct 23, 2023 •

edited

Loading

jordanschalm Oct 20, 2023

durkmurder Oct 23, 2023

jordanschalm Oct 23, 2023

AlexHentschel Oct 23, 2023 •

edited

Loading

jordanschalm Oct 23, 2023

AlexHentschel Oct 23, 2023 •

edited

Loading

AlexHentschel Oct 23, 2023

jordanschalm Oct 20, 2023

durkmurder Oct 23, 2023

jordanschalm left a comment

codecov-commenter commented Oct 23, 2023 •

edited

Loading

AlexHentschel Oct 23, 2023 •

edited

Loading

AlexHentschel left a comment •

edited

Loading

AlexHentschel Oct 23, 2023

AlexHentschel Oct 23, 2023

		@@ -187,17 +150,20 @@ func (u Updater) ProcessEpochCommit(epochCommit flow.EpochCommit) error {
		// No errors are expected during normal operations.
		func (u Updater) UpdateIdentity(updated flow.DynamicIdentityEntry) error {

	prevEpochIdentity, foundInPrev := u.prevEpochIdentitiesLookup[updated.NodeID]
	if foundInPrev {
	prevEpochIdentity.Dynamic = updated.Dynamic
	}

	currentEpochIdentity, foundInCurrent := u.currentEpochIdentitiesLookup[updated.NodeID]
	if foundInCurrent {
	currentEpochIdentity.Dynamic = updated.Dynamic
	}
	nextEpochIdentity, foundInNext := u.nextEpochIdentitiesLookup[updated.NodeID]
	if foundInNext {
	nextEpochIdentity.Dynamic = updated.Dynamic
	}

		// - Per convention, the system smart contracts should list the IdentitySkeletons in canonical order. This is useful for
		// most efficient construction of the full active Identities for an epoch.

	// sanity checking SAFETY-CRITICAL INVARIANT (II):
	// - Per convention, the `flow.EpochSetup` event should list the IdentitySkeletons in canonical order. This is useful
	// for most efficient construction of the full active Identities for an epoch. We enforce this here at the gateway
	// to the protocol state, when we incorporate new information from the EpochSetup event.
	// - Note that the system smart contracts manage the identity table as an unordered set! For the protocol state, we desire a fixed
	// ordering to simplify various implementation details, like the DKG. Therefore, we order identities in `flow.EpochSetup` during
	// conversion from cadence to Go in the function `convert.ServiceEvent(flow.ChainID, flow.Event)` in package `model/convert`

	if u.state.InvalidStateTransitionAttempted {
	return fmt.Errorf("invalid state transition has been attempted, no transition is allowed")
	}

	if u.state.InvalidStateTransitionAttempted {
	return nil // won't process new events if we are in epoch fallback mode.
	}

[Dynamic Protocol State] EpochStateContainer stores epoch active identities #4834

[Dynamic Protocol State] EpochStateContainer stores epoch active identities #4834

Conversation

durkmurder commented Oct 18, 2023

Context

AlexHentschel left a comment • edited Loading

Choose a reason for hiding this comment

AlexHentschel Oct 20, 2023 • edited Loading

Choose a reason for hiding this comment

⚠️❓

Choose a reason for hiding this comment

AlexHentschel Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexHentschel Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexHentschel Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jordanschalm left a comment

Choose a reason for hiding this comment

codecov-commenter commented Oct 23, 2023 • edited Loading

Codecov Report

AlexHentschel Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

⚠️

AlexHentschel left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[Dynamic Protocol State] `EpochStateContainer` stores epoch active identities #4834

[Dynamic Protocol State] `EpochStateContainer` stores epoch active identities #4834

AlexHentschel left a comment •

edited

Loading

AlexHentschel Oct 20, 2023 •

edited

Loading

AlexHentschel Oct 23, 2023 •

edited

Loading

AlexHentschel Oct 23, 2023 •

edited

Loading

AlexHentschel Oct 23, 2023 •

edited

Loading

codecov-commenter commented Oct 23, 2023 •

edited

Loading

AlexHentschel Oct 23, 2023 •

edited

Loading

AlexHentschel left a comment •

edited

Loading