Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic Protocol State] Changing structure of participants in EpochSetup #4726

Conversation

durkmurder
Copy link
Member

@durkmurder durkmurder commented Sep 19, 2023

#4649

Context

This PR handles a few todos from #4649, specifically:

  • Epoch setup event should only provide IdentitySkeletonList
  • Collection cluster API operates with IdentitySkeletonList

It's a huge PR since it contains changes to the flow.IdentityList which were driven by a need to have same functionality for flow.IdentityList and flow.IdentitySkeletonList so instead of duplicating each utility function for flow.IdentityList and flow.Identity I've introduced a generic type which implements most of the operations. The downside of this approach was lots of changes all over the code base.

The most important changes to pay attention on are:

  • Collection cluster committee
  • Bootstrapping process
  • Changes to model/flow/filter/identity.go and how we filter nodes in some cases, replaced notion of weight with initial weight for some cases
  • Introduced generic identity list in model/flow/identity_list.go
  • Updated how we fill data for identities when processing epoch setup event: state/protocol/protocol_state/updater.go

@durkmurder durkmurder changed the base branch from master to yurii/4719-refactor-protocol-state-entry September 19, 2023 13:38
Base automatically changed from yurii/4719-refactor-protocol-state-entry to feature/dynamic-protocol-state September 28, 2023 10:57
@github-actions
Copy link
Contributor

github-actions bot commented Sep 28, 2023

FVM Benchstat comparison

This branch with compared with the base branch onflow:feature/dynamic-protocol-state commit c8d336d

The command (for i in {1..7}; do go test ./fvm ./engine/execution/computation --bench . --tags relic -shuffle=on --benchmem --run ^$; done) was used.

Collapsed results for better readability

old.txtnew.txt
time/opdelta
ComputeBlock/16/cols/128/txes/1/max-concurrency-25.59s ± 0%5.52s ± 0%~(p=1.000 n=1+1)
ComputeBlock/16/cols/128/txes/2/max-concurrency-25.85s ± 0%5.79s ± 0%~(p=1.000 n=1+1)
 
us/txdelta
ComputeBlock/16/cols/128/txes/1/max-concurrency-22.73k ± 0%2.69k ± 0%~(p=1.000 n=1+1)
ComputeBlock/16/cols/128/txes/2/max-concurrency-22.85k ± 0%2.83k ± 0%~(p=1.000 n=1+1)
 
alloc/opdelta
ComputeBlock/16/cols/128/txes/1/max-concurrency-21.65GB ± 0%1.65GB ± 0%~(p=1.000 n=1+1)
ComputeBlock/16/cols/128/txes/2/max-concurrency-21.70GB ± 0%1.70GB ± 0%~(p=1.000 n=1+1)
 
allocs/opdelta
ComputeBlock/16/cols/128/txes/1/max-concurrency-221.9M ± 0%21.9M ± 0%~(p=1.000 n=1+1)
ComputeBlock/16/cols/128/txes/2/max-concurrency-222.6M ± 0%22.6M ± 0%~(p=1.000 n=1+1)
 

@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2023

Codecov Report

Attention: 330 lines in your changes are missing coverage. Please review.

Comparison is base (07c8e12) 52.88% compared to head (244af7b) 55.82%.

Additional details and impacted files
@@                        Coverage Diff                         @@
##           feature/dynamic-protocol-state    #4726      +/-   ##
==================================================================
+ Coverage                           52.88%   55.82%   +2.94%     
==================================================================
  Files                                 756      946     +190     
  Lines                               67853    87837   +19984     
==================================================================
+ Hits                                35885    49038   +13153     
- Misses                              29267    35118    +5851     
- Partials                             2701     3681     +980     
Flag Coverage Δ
unittests 55.82% <50.74%> (+2.94%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
cmd/bootstrap/cmd/clusters.go 72.22% <100.00%> (ø)
cmd/bootstrap/cmd/constraints.go 68.42% <100.00%> (ø)
cmd/bootstrap/cmd/finalize.go 62.18% <100.00%> (ø)
cmd/bootstrap/cmd/seal.go 67.27% <100.00%> (ø)
...nsensus/hotstuff/committees/consensus_committee.go 69.33% <100.00%> (ø)
engine/access/ingestion/engine.go 59.80% <100.00%> (ø)
engine/access/rpc/backend/backend_accounts.go 57.89% <100.00%> (ø)
engine/access/rpc/backend/backend_events.go 69.67% <100.00%> (ø)
engine/access/rpc/backend/backend_scripts.go 67.24% <100.00%> (ø)
engine/access/rpc/backend/node_communicator.go 60.71% <ø> (ø)
... and 61 more

... and 170 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -151,7 +151,7 @@ generate-fvm-env-wrappers:
generate-mocks: install-mock-generators
mockery --name '(Connector|PingInfoProvider)' --dir=network/p2p --case=underscore --output="./network/mocknetwork" --outpkg="mocknetwork"
mockgen -destination=storage/mocks/storage.go -package=mocks github.com/onflow/flow-go/storage Blocks,Headers,Payloads,Collections,Commits,Events,ServiceEvents,TransactionResults
mockgen -destination=module/mocks/network.go -package=mocks github.com/onflow/flow-go/module Local,Requester
#mockgen -destination=module/mocks/network.go -package=mocks github.com/onflow/flow-go/module Local,Requester
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mockgen doesn't support generics and it's deprecated I think we should replace it with mockery but not as part of this PR. I've manually updated generated code to compile.

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some preliminary suggestions on documentation

model/flow/identity_list.go Outdated Show resolved Hide resolved
model/flow/identity_list.go Outdated Show resolved Hide resolved
model/flow/identity_list.go Outdated Show resolved Hide resolved
model/flow/identity_list.go Outdated Show resolved Hide resolved
model/flow/identity_list.go Outdated Show resolved Hide resolved
model/flow/identity_list.go Show resolved Hide resolved
model/flow/identity_list.go Outdated Show resolved Hide resolved
model/flow/identity_list.go Outdated Show resolved Hide resolved
model/flow/identity_list.go Outdated Show resolved Hide resolved
model/flow/identity_list.go Outdated Show resolved Hide resolved
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second round of comments:
Most are documentation suggestions. Though, I think we might need some more work related to the logic of BuildIdentityTable. I think the logic is not correct right now. See my comments #4726 (comment) and #4726 (comment) for details.

model/flow/identity_list.go Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove

// (c) 2019 Dapper Labs - ALL RIGHTS RESERVED
(also in other files if you randomly come across this copyright statement) Flow becoming its own organization, which we would also like to be reflected in code. Removing this copyright notice is aligned with Chirag and our legal team.

Comment on lines 10 to 13
// Adapt adapts a filter for a specific identity type.
//
// Converts flow.IdentityFilter[flow.IdentitySkeleton] to flow.IdentityFilter[flow.Identity].
func Adapt(f flow.IdentityFilter[flow.IdentitySkeleton]) flow.IdentityFilter[flow.Identity] {
Copy link
Member

@AlexHentschel AlexHentschel Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: updated my suggestion in agreement with Jordan's comment to keep the name Adapt. I am happy, if we update the documentation to the following:

Suggested change
// Adapt adapts a filter for a specific identity type.
//
// Converts flow.IdentityFilter[flow.IdentitySkeleton] to flow.IdentityFilter[flow.Identity].
func Adapt(f flow.IdentityFilter[flow.IdentitySkeleton]) flow.IdentityFilter[flow.Identity] {
// Adapt takes an IdentityFilter on the domain of IdentitySkeletons
// and adapts the filter to the domain of full Identities. In other words, it converts
// flow.IdentityFilter[flow.IdentitySkeleton] to flow.IdentityFilter[flow.Identity].
func Adapt(f flow.IdentityFilter[flow.IdentitySkeleton]) flow.IdentityFilter[flow.Identity] {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I prefer Adapt. This function is plumbing; it isn't functionally important, and it isn't important to communicate in exhaustive detail what it is doing at the callsite. Sometimes a vague name like Adapt is beneficial because no-one is going to assume they understand what it does based solely on the name.

Copy link
Member

@AlexHentschel AlexHentschel Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good reasoning.

edit: I also misunderstood the filter a little bit ... the adapt function works a lot more broadly, in that it can translate any filter acting on IdentitySkeleton to a filter working on the domain of full identities.

Not sure, I think there might be a mathematical term for this .. just wildly guessing "logical function continuation to product domain" or "domain extension" . In the absence of a better term, Adapt is probably fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, Adapt is a general purpose function to adapt filters from smaller subset to larger one. Why adapt? Because it adapts interfaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 😅
thank you, just leaned a lot through this discussion

model/flow/filter/identity.go Outdated Show resolved Hide resolved
model/flow/filter/identity.go Outdated Show resolved Hide resolved
model/flow/protocol_state.go Outdated Show resolved Hide resolved
model/flow/protocol_state.go Outdated Show resolved Hide resolved
targetEpochIdentitySkeletons IdentityList, // TODO: change to `IdentitySkeletonList`
adjacentEpochIdentitySkeletons IdentityList, // TODO: change to `IdentitySkeletonList`
targetEpochIdentitySkeletons IdentitySkeletonList,
adjacentEpochIdentitySkeletons IdentitySkeletonList,
) (IdentityList, error) {
// produce a unique set for current and previous epoch participants
allEpochParticipants := targetEpochIdentitySkeletons.Union(adjacentEpochIdentitySkeletons)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️🕷️❓

I am wondering if this code is maybe outdated. Or I am misunderstanding something. Context:

  • targetEpochDynamicIdentities lists only the active identities for the target epoch (same convention for targetEpochIdentitySkeletons)
  • adjacentEpochIdentitySkeletons describes the identities of the adjacent epoch

We check this condition:

if len(allEpochParticipants) != len(targetEpochDynamicIdentities) {
return nil, fmt.Errorf("invalid number of identities in protocol state: expected %d, got %d", len(allEpochParticipants), len(targetEpochDynamicIdentities))
}

which only holds if adjacentEpochIdentitySkeletonstargetEpochDynamicIdentities.

Not sure whether my analysis is correct, but there is a problem with the unit test, which I suspect is the reason the tests didn't catch this:

  1. In most unit tests, we compare the output of flow.NewRichProtocolStateEntry with the output of flow.BuildIdentityTable. However, method NewRichProtocolStateEntry calls internally also BuildIdentityTable. So if there is a bug in BuildIdentityTable, the comparison still passes despite the output being wrong. We do not have tests that check BuildIdentityTable by itself, which I think would be needed.
  2. We are implicitly relying on internal implementation details of unittest.ProtocolStateFixture, which I don't think is reliable (as the current situation shows). Many people are using and potentially updating these fixtures and these fixtures are not necessarily always updated to reflect the latest code behaviour.
    • for example, the fixture still assigned identities across epoch boundaries to ActiveIdentities:
      ActiveIdentities: flow.DynamicIdentityEntryListFromIdentities(allIdentities),
      which in my opinion conflicts with the convention
      // ActiveIdentities contains the dynamic identity properties for the nodes that
      // are active in this epoch. Active means that these nodes are authorized to contribute to
      // extending the chain. Nodes are listed in `Identities` if and only if
      // they are part of the EpochSetup event for the respective epoch.
      // The dynamic identity properties can change from block to block. Each non-deferred
      // identity-mutating operation is applied independently to the `ActiveIdentities`
      // of the relevant epoch's EpochStateContainer separately.
      // Identities are always sorted in canonical order.
      //
      // Context: In comparison, nodes that are joining in the next epoch or left as of this
      // epoch are only allowed to listen to the network but not actively contribute. Such
      // nodes are _not_ part of `Identities`.
      ActiveIdentities DynamicIdentityEntryList
      where we say that only the active identities (only identities from current Epoch setup event) should be listed
    • Essentially, you are treating the fixtures as a reference implementation for a non-trivial constructor expecting that it constructs a relatively non-trivial test case. I think this is beyond the scope of fixtures and should rather be part of the specific unit tests.

Suggestion:

  • add unit tests for BuildIdentityTable:
    • construct inputs individually with fixtures, covering various different cases, including:
      • adjacentEpochIdentitySkeletons = ∅,
      • adjacentEpochIdentitySkeletonstargetEpochIdentitySkeletons,
      • adjacentEpochIdentitySkeletons = targetEpochIdentitySkeletons,
      • adjacentEpochIdentitySkeletonstargetEpochIdentitySkeletons,
      • adjacentEpochIdentitySkeletonstargetEpochIdentitySkeletons = ∅
  • once you have those tests, you can compare the output of NewRichProtocolStateEntry to the return value of BuildIdentityTable, which your previous tests ensure is correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this doesn't seem right to me either

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future record, this is addressed (or at least worked on) in the subsequent PR #4834 (and further suggested amendments in PR #4854)

model/flow/protocol_state.go Outdated Show resolved Hide resolved
@@ -384,7 +384,7 @@ func buildIdentityTable(
return nil, fmt.Errorf("identites in protocol state are not in canonical order: expected %s, got %s", allEpochParticipants[i].NodeID, identity.NodeID)
}
result = append(result, &Identity{
IdentitySkeleton: allEpochParticipants[i].IdentitySkeleton,
IdentitySkeleton: *allEpochParticipants[i],
DynamicIdentity: identity.Dynamic,
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️

this code does not really make sense to me given my understanding. In my opinion, we should be constructing the IdentityList for the identities listed in allEpochParticipants. But as I understand the code, we are somehow assuming that the targetEpochDynamicIdentities already contains their dynamic portion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future record, this is addressed (or at least worked on) in the subsequent PR #4834 (and further suggested amendments in PR #4854)

consensus/integration/nodes_test.go Outdated Show resolved Hide resolved
@@ -488,9 +488,9 @@ func findAllExecutionNodes(
// If neither preferred nor fixed nodes are defined, then all execution node matching the executor IDs are returned.
// e.g. If execution nodes in identity table are {1,2,3,4}, preferred ENs are defined as {2,3,4}
// and the executor IDs is {1,2,3}, then {2, 3} is returned as the chosen subset of ENs
func chooseExecutionNodes(state protocol.State, executorIDs flow.IdentifierList) (flow.IdentityList, error) {
func chooseExecutionNodes(state protocol.State, executorIDs flow.IdentifierList) (flow.IdentitySkeletonList, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to convert the return value to the SkeletonList type here?

Either way works for this use-case, but it seems unnecessary to do the conversion work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favour of Yurii's approach here, as I think it is functionally more general. It is straight forward to extend a filter operating on IdentitySkeleton to full Identity (see this comment). The other way around, you can't just take a filter operating on the domain of full Identity and try to feed in IdentitySkeleton, because the filter might inspect dynamic properties that aren't present in the skeleton.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought: returning identity skeletons also indicates that the algorithm is only considering the epoch-static information, but not the ejected flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main motivation was to expose the bare minimum of information that is needed. By returning flow.IdentitySkeletonList I explicitly state that I was operating with epoch-static information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you both 👍. We're choosing the datatype that reflects which portion of the identity table to use. Makes sense.

engine/collection/test/cluster_switchover_test.go Outdated Show resolved Hide resolved
engine/execution/ingestion/engine_test.go Show resolved Hide resolved
engine/execution/execution_test.go Outdated Show resolved Hide resolved
engine/verification/utils/unittest/helper.go Show resolved Hide resolved
Comment on lines 10 to 13
// Adapt adapts a filter for a specific identity type.
//
// Converts flow.IdentityFilter[flow.IdentitySkeleton] to flow.IdentityFilter[flow.Identity].
func Adapt(f flow.IdentityFilter[flow.IdentitySkeleton]) flow.IdentityFilter[flow.Identity] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I prefer Adapt. This function is plumbing; it isn't functionally important, and it isn't important to communicate in exhaustive detail what it is doing at the callsite. Sometimes a vague name like Adapt is beneficial because no-one is going to assume they understand what it does based solely on the name.

targetEpochIdentitySkeletons IdentityList, // TODO: change to `IdentitySkeletonList`
adjacentEpochIdentitySkeletons IdentityList, // TODO: change to `IdentitySkeletonList`
targetEpochIdentitySkeletons IdentitySkeletonList,
adjacentEpochIdentitySkeletons IdentitySkeletonList,
) (IdentityList, error) {
// produce a unique set for current and previous epoch participants
allEpochParticipants := targetEpochIdentitySkeletons.Union(adjacentEpochIdentitySkeletons)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this doesn't seem right to me either

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing PR. Got through all of it. Looking forward to the next iteration around the convention of EpochStateContainer.ActiveIdentities. I think the code is overall structured very well and makes a lot of sense. It very nicely enables the differentiation between epoch-static identity checks (implemented via Filters operating on IdentitySkeleton) vs checks on the dynamic identity including ejection status (implemented via Filters operating on full Identity).

model/flow/filter/identity.go Outdated Show resolved Hide resolved
@@ -488,9 +488,9 @@ func findAllExecutionNodes(
// If neither preferred nor fixed nodes are defined, then all execution node matching the executor IDs are returned.
// e.g. If execution nodes in identity table are {1,2,3,4}, preferred ENs are defined as {2,3,4}
// and the executor IDs is {1,2,3}, then {2, 3} is returned as the chosen subset of ENs
func chooseExecutionNodes(state protocol.State, executorIDs flow.IdentifierList) (flow.IdentityList, error) {
func chooseExecutionNodes(state protocol.State, executorIDs flow.IdentifierList) (flow.IdentitySkeletonList, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favour of Yurii's approach here, as I think it is functionally more general. It is straight forward to extend a filter operating on IdentitySkeleton to full Identity (see this comment). The other way around, you can't just take a filter operating on the domain of full Identity and try to feed in IdentitySkeleton, because the filter might inspect dynamic properties that aren't present in the skeleton.

@@ -488,9 +488,9 @@ func findAllExecutionNodes(
// If neither preferred nor fixed nodes are defined, then all execution node matching the executor IDs are returned.
// e.g. If execution nodes in identity table are {1,2,3,4}, preferred ENs are defined as {2,3,4}
// and the executor IDs is {1,2,3}, then {2, 3} is returned as the chosen subset of ENs
func chooseExecutionNodes(state protocol.State, executorIDs flow.IdentifierList) (flow.IdentityList, error) {
func chooseExecutionNodes(state protocol.State, executorIDs flow.IdentifierList) (flow.IdentitySkeletonList, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought: returning identity skeletons also indicates that the algorithm is only considering the epoch-static information, but not the ejected flag.

@@ -150,8 +150,8 @@ func NewMessageHub(log zerolog.Logger,
ownOutboundProposals: ownOutboundProposals,
ownOutboundTimeouts: ownOutboundTimeouts,
clusterIdentityFilter: filter.And(
filter.In(currentCluster),
filter.Not(filter.HasNodeID(me.NodeID())),
filter.Adapt(filter.In(currentCluster)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My subjective opinion is that the following would be more explicit

Suggested change
filter.Adapt(filter.In(currentCluster)),
filter.HasNodeID[flow.Identity](currentCluster.NodeIDs()...),

If I understand correctly, this is what we are functionally doing.

I think consensus additionally excludes ejected nodes, we could do the same, as we are filtering full identities

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with the PR being merged into our feature branch with the comments addressed except for those around EpochStateContainer.ActiveIdentities. We can iterate over EpochStateContainer.ActiveIdentities in the follow-up PR #4834.

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my comments outside those we're addressing in #4834 are addressed.

@durkmurder durkmurder merged commit de3c468 into feature/dynamic-protocol-state Oct 23, 2023
36 checks passed
@durkmurder durkmurder deleted the yurii/4649-todos-and-refactoring-part-1 branch October 23, 2023 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants