Skip to content

fix(server): Reconcile bucket set#7260

Open
dranikpg wants to merge 1 commit intodragonflydb:mainfrom
dranikpg:fix-inconsistent-bucketset
Open

fix(server): Reconcile bucket set#7260
dranikpg wants to merge 1 commit intodragonflydb:mainfrom
dranikpg:fix-inconsistent-bucketset

Conversation

@dranikpg
Copy link
Copy Markdown
Contributor

@dranikpg dranikpg commented May 4, 2026

Fixes #7245

Repeat on change callbacks if the insertion bucket set changed while calling them

@dranikpg
Copy link
Copy Markdown
Contributor Author

dranikpg commented May 4, 2026

augment review

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented May 4, 2026

🤖 Augment PR Summary

Summary: Updates DbSlice::AddOrFindInternal to re-run change callbacks until CVCUponInsert reports a stable target bucket set.

Why: Prevents missed pre-insert bucket serialization/notifications when callbacks can block and concurrent activity changes the insertion bucket set (fixes #7245).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/server/db_slice.cc
if (!change_cb_.empty()) {
auto bucket_set = db.prime.CVCUponInsert(key);
CallChangeCallbacks(cntx.db_index, bucket_set);
for (bool consistent = false; !consistent;) {
Copy link
Copy Markdown

@augmentcode augmentcode Bot May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/server/db_slice.cc:688 — This retry loop has no bound, so if CVCUponInsert(key) keeps changing due to concurrent activity while callbacks block/yield, AddOrFindInternal could potentially livelock and never proceed to the actual insert. Consider adding some guard/guarantee (or documenting the convergence invariant) to ensure termination.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually all possible buckets will be visited, which implies no suspension will occur and the operation will finish

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that as a comment

@dranikpg dranikpg marked this pull request as ready for review May 6, 2026 06:33
Copilot AI review requested due to automatic review settings May 6, 2026 06:33
Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a fiber-preemption race in DbSlice::AddOrFindInternal where change callbacks (used by snapshotting/migration flows) can yield, allowing concurrent inserts to grow/split the underlying DashTable and invalidate the originally computed insertion bucket set.

Changes:

  • Replaces a post-callback DCHECK on CVCUponInsert() stability with a retry loop that re-runs change callbacks until the computed bucket set remains consistent across the callback.
  • Adds clarifying comments around why callbacks are invoked on the computed insertion bucket set.

Comment thread src/server/db_slice.cc
Comment on lines +688 to +694
for (bool consistent = false; !consistent;) {
auto bucket_set = db.prime.CVCUponInsert(key);
CallChangeCallbacks(cntx.db_index, bucket_set);

// Set of possible insertion buckets must be the same after possibly blocking call
DCHECK(bucket_set == db.prime.CVCUponInsert(key));
// Repeat the change callbacks if the target set changed
consistent = (bucket_set == db.prime.CVCUponInsert(key));
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FATAL Check failed: bucket_set == db.prime.CVCUponInsert(key) during test_cluster_migrations_sequence

2 participants