-
Notifications
You must be signed in to change notification settings - Fork 234
Update CheckQuorum and DropIgnoredMessage in ccfraft.tla #7374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
CheckQuorum(i) == | ||
\* Check node is a leader (and therefore has not completed retirement) | ||
/\ leadershipState[i] = Leader | ||
\* There must be an active config which has a replica which is not this node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, there must be backups, if only on the horizon, for it to make sense to check quorum at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, more that the condition will always succeed (thus not triggering checkquorum) unless there is at least one backup. But I'll update the comment to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are the same? "there must be backups" <=> "there is at least one backup"? On the horizon as in, in active configs (but not necessarily committed yet).
@cjen1-msft when Model Checking, we pass --disable-check-quorum, which goes through here, here and ultimately here. The simulation is doing some CheckQuorum though. |
Co-authored-by: Eddy Ashton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes two instances where the TLA+ specification over-approximated CCF's behavior, causing false liveness counterexamples. The changes align the specification more closely with the actual implementation by restricting when CheckQuorum can trigger and when messages can be dropped.
- Adds a constraint to prevent CheckQuorum from triggering in single-node configurations
- Prevents DropIgnoredMessage from dropping RequestVoteRequest messages from unknown nodes
I found two places where our spec overapproximates the behaviours of CCF.
First CheckQuorum can always trigger on a leader, so in the case of a single nodeconfiguration it can trigger in the spec but never in practise.
Next the DropIgnoredMessage was allowed to drop messages if they were ignored. We specifically don't ignore RequestVote messages from unknown replicas, while this is allowed by the spec.
Both of these were causing false liveness counterexamples in testing of the prevote changes.