Skip to content

Conversation

cjen1-msft
Copy link
Contributor

I found two places where our spec overapproximates the behaviours of CCF.

First CheckQuorum can always trigger on a leader, so in the case of a single nodeconfiguration it can trigger in the spec but never in practise.

Next the DropIgnoredMessage was allowed to drop messages if they were ignored. We specifically don't ignore RequestVote messages from unknown replicas, while this is allowed by the spec.

Both of these were causing false liveness counterexamples in testing of the prevote changes.

CheckQuorum(i) ==
\* Check node is a leader (and therefore has not completed retirement)
/\ leadershipState[i] = Leader
\* There must be an active config which has a replica which is not this node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, there must be backups, if only on the horizon, for it to make sense to check quorum at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, more that the condition will always succeed (thus not triggering checkquorum) unless there is at least one backup. But I'll update the comment to reflect that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are the same? "there must be backups" <=> "there is at least one backup"? On the horizon as in, in active configs (but not necessarily committed yet).

@achamayou achamayou added the run-long-verification Run Long Verification jobs label Oct 20, 2025
@achamayou
Copy link
Member

@cjen1-msft when Model Checking, we pass --disable-check-quorum, which goes through here, here and ultimately here.

The simulation is doing some CheckQuorum though.

@cjen1-msft cjen1-msft marked this pull request as ready for review October 20, 2025 15:07
@cjen1-msft cjen1-msft requested a review from a team as a code owner October 20, 2025 15:07
@Copilot Copilot AI review requested due to automatic review settings October 20, 2025 15:07
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes two instances where the TLA+ specification over-approximated CCF's behavior, causing false liveness counterexamples. The changes align the specification more closely with the actual implementation by restricting when CheckQuorum can trigger and when messages can be dropped.

  • Adds a constraint to prevent CheckQuorum from triggering in single-node configurations
  • Prevents DropIgnoredMessage from dropping RequestVoteRequest messages from unknown nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-long-verification Run Long Verification jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants