-
Notifications
You must be signed in to change notification settings - Fork 1k
Fix the deadlock during statements gossiping #9868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| let keys: Vec<_> = { | ||
| let index = self.index.read(); | ||
| index.entries.keys().cloned().collect() | ||
| }; | ||
|
|
||
| let mut result = Vec::with_capacity(keys.len()); | ||
| for h in keys { | ||
| let encoded = self.db.get(col::STATEMENTS, &h).map_err(|e| Error::Db(e.to_string()))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that by the time we read into the db the statement might have been removed.
So this operation doesn't return the view of the statement store at one point in time. Instead it returns most statements. It could be that a statement was removed and another added in the meantime.
But it can be good and more efficient than before as well.
But for instance if a user always. Override one statement in one channel. While the store always have one statement for this user, this function might return none sometimes. But again it can be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it can be good and more efficient than before as well.
It could be good, but benchmarking revealed a regression. I suppose without the lock, concurrent access to the DB slows things down
c2e4e5d to
faa4bf0
Compare
| Block::Hash: From<BlockHash>, | ||
| Client: ProvideRuntimeApi<Block> | ||
| + HeaderBackend<Block> | ||
| + sc_client_api::ExecutorProvider<Block> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the subject, but should be removed as we don't need this trait. It only complicates the test setup.
| } | ||
|
|
||
| /// Perform periodic store maintenance | ||
| pub fn maintain(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the current deadlock, but better to remove unnecessary reads. It makes the log more precise as the change of index between maintenance and logging is possible. Keeping the write lock during the DB commit is not necessary.
|
/cmd prdoc --audience node_dev --bump patch |
|
All GitHub workflows were cancelled due to failure one of the required jobs. |
26918d9 to
4bfda69
Compare
# Description During statement store benchmarking we experienced deadlock-like behavior which we found happened during statement propagation. Every second statements were propagating, locking the index which possibly caused the deadlock. After the fix, the observed behavior no longer occurs. Even though there is a possibility to unsync the DB and the index for read operations and release locks earlier, which should be harmless, it leads to regressions. I suspect because of concurrent access to many calls of db.get(). Checked with the benchmarks in #9884 ## Integration This PR should not affect downstream projects. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit ed4eebb)
|
Successfully created backport PR for |
# Description During statement store benchmarking we experienced deadlock-like behavior which we found happened during statement propagation. Every second statements were propagating, locking the index which possibly caused the deadlock. After the fix, the observed behavior no longer occurs. Even though there is a possibility to unsync the DB and the index for read operations and release locks earlier, which should be harmless, it leads to regressions. I suspect because of concurrent access to many calls of db.get(). Checked with the benchmarks in #9884 ## Integration This PR should not affect downstream projects. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit ed4eebb)
|
Successfully created backport PR for |
|
Successfully created backport PR for |
# Description During statement store benchmarking we experienced deadlock-like behavior which we found happened during statement propagation. Every second statements were propagating, locking the index which possibly caused the deadlock. After the fix, the observed behavior no longer occurs. Even though there is a possibility to unsync the DB and the index for read operations and release locks earlier, which should be harmless, it leads to regressions. I suspect because of concurrent access to many calls of db.get(). Checked with the benchmarks in #9884 ## Integration This PR should not affect downstream projects. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit ed4eebb)
Backport #9868 into `stable2509` from AndreiEres. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> Co-authored-by: Andrei Eres <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Backport #9868 into `stable2506` from AndreiEres. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> Co-authored-by: Andrei Eres <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
# Description During statement store benchmarking we experienced deadlock-like behavior which we found happened during statement propagation. Every second statements were propagating, locking the index which possibly caused the deadlock. After the fix, the observed behavior no longer occurs. Even though there is a possibility to unsync the DB and the index for read operations and release locks earlier, which should be harmless, it leads to regressions. I suspect because of concurrent access to many calls of db.get(). Checked with the benchmarks in #9884 ## Integration This PR should not affect downstream projects. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
# Description During statement store benchmarking we experienced deadlock-like behavior which we found happened during statement propagation. Every second statements were propagating, locking the index which possibly caused the deadlock. After the fix, the observed behavior no longer occurs. Even though there is a possibility to unsync the DB and the index for read operations and release locks earlier, which should be harmless, it leads to regressions. I suspect because of concurrent access to many calls of db.get(). Checked with the benchmarks in #9884 ## Integration This PR should not affect downstream projects. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Description
During statement store benchmarking we experienced deadlock-like behavior which we found happened during statement propagation. Every second statements were propagating, locking the index which possibly caused the deadlock. After the fix, the observed behavior no longer occurs.
Even though there is a possibility to unsync the DB and the index for read operations and release locks earlier, which should be harmless, it leads to regressions. I suspect because of concurrent access to many calls of db.get(). Checked with the benchmarks in #9884
Integration
This PR should not affect downstream projects.