Skip to content

Conversation

cjen1-msft
Copy link
Contributor

We have CheckQuorum to ensure that a leader should step down if it is not a good leader, as this acts as a good liveness probe of the system at large.

Our CheckQuorum condition is that the leader has a committing quorum in any configuration.
Our commit condition is that the leader has a committing quorum in every configuration.

This PR includes a scenario (check_quorum_2) which demonstrates the difference between these two conditions.

Suppose we have a 3 node cluster with n0 the leader initially, and then retire both n1 and n2 but do not replicate this to them.
Then we partition n0.

Due to CheckQuorum n0 is still a good leader, however it is unable to commit.
While the remainder of the cluster elects a new replacement leader and continues to function.

The aim of the change to raft.h is to make clear what the condition is, as well as fixing this.

Note

The other direction of this bug would be more severe, (going from n0, to n0,n1,n2, check_quorum_3) with n0 being inaccessible but staying alive due to CheckQuorum.
But the backups are unable to elect themselves without a vote from n0, so this isn't a problem in practise.

backup_ack_timeout_count++;
auto search = all_other_nodes.find(node.first);
if (
search == all_other_nodes.end() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always adding one to live nodes in config, is that right? In an atomic reconfiguration situation, the next configuration may not include the next primary, so including it in the count there does not seem obviously correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed: all_other_nodes contains every node in any configuration which is not the primary. So if there is a node in a configuration which is not in all_other_nodes then it is the primary, and hence is live.

I've added a comment to this effect.

Comment on lines 11 to 17
swap_nodes,2,out,1,2

disconnect_node,0

periodic_one,0,100
periodic_one,1,100
dispatch_all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments and asserts if possible. We retire nodes 1 and 2, but 0 fails to replicate this update to the other nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this, let me know if you think there are enough. More are easy enough to add :)

@cjen1-msft cjen1-msft marked this pull request as ready for review October 20, 2025 14:28
@cjen1-msft cjen1-msft requested a review from a team as a code owner October 20, 2025 14:28
@Copilot Copilot AI review requested due to automatic review settings October 20, 2025 14:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a bug in the CheckQuorum implementation where the leader's quorum check condition was inconsistent with the commit condition. The change ensures that a leader only remains active if it has a committing quorum in every active configuration, not just any configuration. This prevents scenarios where a leader believes it has quorum but is actually unable to commit entries.

Key Changes:

  • Modified CheckQuorum logic in raft.h to require quorum in all configurations instead of any configuration
  • Added test scenarios demonstrating the old incorrect behavior and validating the fix
  • Introduced new test assertions (assert_config, assert_missing_config) to verify configuration state in tests

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/consensus/aft/raft.h Changed CheckQuorum from OR logic (quorum in any config) to AND logic (quorum in every config) using std::all_of
src/consensus/aft/test/driver.h Added assert_config and assert_missing_config helper methods for validating node configurations in tests
src/consensus/aft/test/driver.cpp Registered the new assertion commands in the test driver command dispatcher
tests/raft_scenarios/check_quorum_0_012 Test scenario demonstrating transition from single node to three-node cluster with CheckQuorum behavior
tests/raft_scenarios/check_quorum_012_0 Test scenario demonstrating retirement of nodes while partitioned, verifying correct leader stepdown

@cjen1-msft
Copy link
Contributor Author

The reason getting trace validation to work required reducing the constraints on IsBecomeFollower is as follows:

Actions from raft.h trace:

  • receive append_entries_request (no change)
  • become follower (update term, follower, rollback uncommittable)
  • <Don't emit any event> Roll back to match incoming append entries
  • execute_append_entries_sync (apply changes)
  • send_append_entries

Required actions from Traceccfraft.tla pov:

  • IsReceiveAppendEntriesRequest (UpdateTerm & ExecuteAppendEntries => update term, follower, rollback uncommittable, apply changes)
  • IsBecomeFollower (no change)
  • IsExecuteAppendEntries (no change)
  • IsSendAppendEntriesResponse (no change)

So if IsBecomeFollower asserts that the spec state matches the raft.h state, and that it is unchanged, then IsReceiveAppendEntries's UpdateTerm will be in conflict with this.

Hence to fix this specific issue we should not roll back the log or increment the term until the message is sent.
But also we could argue that we shouldn't constrain the membershipState as it may not be stable yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants