-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pebble Storage] Refactor indexing new blocks #6360
[Pebble Storage] Refactor indexing new blocks #6360
Conversation
b541fbe
to
148c877
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the approach here provides any concurrency safety. We need to hold the lock until the write batch is committed.
To address these cases in general, I think we need to define and acquire the lock at the level where the write batch is instantiated and committed. For example, one lock protecting access to the entire block insertion procedure.
// IndexNewBlock will add parent-child index for the new block. | ||
// - Each block has a parent, we use this parent-child relationship to build a reverse index | ||
// - for looking up children blocks for a given block. This is useful for forks recovery | ||
// where we want to find all the pending children blocks for the lastest finalized block. | ||
// | ||
// # It's concurrent safe to call this function by multiple goroutines, as it will acquire a lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we have 2 threads A and B concurrently writing blocks with a common parent (that initially has no children), the sequence diagram would look like:
sequenceDiagram
par
A ->> DB: starts write batch
and
B ->> DB: starts write batch
end
A -> DB: acquire lock
activate A
A -> DB: read Children (nil)
A -> DB: write Chidren [c1]
A -> DB: release lock
deactivate A
B -> DB: acquire lock
activate B
B -> DB: read Children (nil)
B -> DB: write Chidren [c2]
B -> DB: release lock
deactivate B
par
A ->> DB: commit write batch
and
B ->> DB: commit write batch
end
We should end up with children [c1,c2]
, but instead we will end up with either [c1]
or [c2]
.
The lock added here does not provide any guarantee that this function will operate correctly when called in parallel, because the lock is released before the write is committed.
Suppose we have some database operation P
consisting of a set of reads R
and a set of writes W
, where the writes are dependent on the values read in R
. In general, to safely synchronize P
, I think we need to:
- acquire
Lock_P
, the lock protecting access toP
- perform all reads in
R
, and memorize the values - instantiate the write batch
- perform all writes in
W
- commit the write batch
- release
Lock_P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice diagram. And thanks for the feedback.
I used the callback to release the lock after the batch update is committed.
Also added a test case to reproduce and verify the fix.
Consolidate and merge all changes to |
Addressing review comments
This PR makes it concurrent safe to index new blocks.
Running the concurrency test 100 times and passed: