Skip to content

Conversation

@AndreiEres
Copy link
Contributor

@AndreiEres AndreiEres commented Sep 29, 2025

Description

During statement store benchmarking we experienced deadlock-like behavior which we found happened during statement propagation. Every second statements were propagating, locking the index which possibly caused the deadlock. After the fix, the observed behavior no longer occurs.

Even though there is a possibility to unsync the DB and the index for read operations and release locks earlier, which should be harmless, it leads to regressions. I suspect because of concurrent access to many calls of db.get(). Checked with the benchmarks in #9884

Integration

This PR should not affect downstream projects.

Comment on lines 765 to 772
let keys: Vec<_> = {
let index = self.index.read();
index.entries.keys().cloned().collect()
};

let mut result = Vec::with_capacity(keys.len());
for h in keys {
let encoded = self.db.get(col::STATEMENTS, &h).map_err(|e| Error::Db(e.to_string()))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that by the time we read into the db the statement might have been removed.
So this operation doesn't return the view of the statement store at one point in time. Instead it returns most statements. It could be that a statement was removed and another added in the meantime.

But it can be good and more efficient than before as well.

But for instance if a user always. Override one statement in one channel. While the store always have one statement for this user, this function might return none sometimes. But again it can be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it can be good and more efficient than before as well.

It could be good, but benchmarking revealed a regression. I suppose without the lock, concurrent access to the DB slows things down

@AndreiEres AndreiEres force-pushed the AndreiEres/fix-statement-store-deadlock branch 2 times, most recently from c2e4e5d to faa4bf0 Compare September 30, 2025 15:35
Block::Hash: From<BlockHash>,
Client: ProvideRuntimeApi<Block>
+ HeaderBackend<Block>
+ sc_client_api::ExecutorProvider<Block>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the subject, but should be removed as we don't need this trait. It only complicates the test setup.

}

/// Perform periodic store maintenance
pub fn maintain(&self) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the current deadlock, but better to remove unnecessary reads. It makes the log more precise as the change of index between maintenance and logging is possible. Keeping the write lock during the DB commit is not necessary.

@AndreiEres
Copy link
Contributor Author

/cmd prdoc --audience node_dev --bump patch

@AndreiEres AndreiEres added the T0-node This PR/Issue is related to the topic “node”. label Sep 30, 2025
@AndreiEres AndreiEres requested a review from gui1117 September 30, 2025 16:23
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/18137050908
Failed job name: test-linux-stable

@AndreiEres AndreiEres force-pushed the AndreiEres/fix-statement-store-deadlock branch from 26918d9 to 4bfda69 Compare September 30, 2025 16:55
@AndreiEres AndreiEres added this pull request to the merge queue Sep 30, 2025
Merged via the queue into master with commit ed4eebb Sep 30, 2025
244 of 246 checks passed
@AndreiEres AndreiEres deleted the AndreiEres/fix-statement-store-deadlock branch September 30, 2025 22:59
@gui1117 gui1117 added A4-backport-stable2506 Pull request must be backported to the stable2506 release branch A4-backport-unstable2507 Pull request must be backported to the unstable2507 release branch A4-backport-stable2509 Pull request must be backported to the stable2509 release branch labels Sep 30, 2025
paritytech-release-backport-bot bot pushed a commit that referenced this pull request Sep 30, 2025
# Description

During statement store benchmarking we experienced deadlock-like
behavior which we found happened during statement propagation. Every
second statements were propagating, locking the index which possibly
caused the deadlock. After the fix, the observed behavior no longer
occurs.

Even though there is a possibility to unsync the DB and the index for
read operations and release locks earlier, which should be harmless, it
leads to regressions. I suspect because of concurrent access to many
calls of db.get(). Checked with the benchmarks in
#9884

## Integration

This PR should not affect downstream projects.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit ed4eebb)
@paritytech-release-backport-bot

Successfully created backport PR for stable2506:

paritytech-release-backport-bot bot pushed a commit that referenced this pull request Sep 30, 2025
# Description

During statement store benchmarking we experienced deadlock-like
behavior which we found happened during statement propagation. Every
second statements were propagating, locking the index which possibly
caused the deadlock. After the fix, the observed behavior no longer
occurs.

Even though there is a possibility to unsync the DB and the index for
read operations and release locks earlier, which should be harmless, it
leads to regressions. I suspect because of concurrent access to many
calls of db.get(). Checked with the benchmarks in
#9884

## Integration

This PR should not affect downstream projects.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit ed4eebb)
@paritytech-release-backport-bot

Successfully created backport PR for unstable2507:

@paritytech-release-backport-bot

Successfully created backport PR for stable2509:

paritytech-release-backport-bot bot pushed a commit that referenced this pull request Sep 30, 2025
# Description

During statement store benchmarking we experienced deadlock-like
behavior which we found happened during statement propagation. Every
second statements were propagating, locking the index which possibly
caused the deadlock. After the fix, the observed behavior no longer
occurs.

Even though there is a possibility to unsync the DB and the index for
read operations and release locks earlier, which should be harmless, it
leads to regressions. I suspect because of concurrent access to many
calls of db.get(). Checked with the benchmarks in
#9884

## Integration

This PR should not affect downstream projects.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit ed4eebb)
EgorPopelyaev pushed a commit that referenced this pull request Oct 1, 2025
Backport #9868 into `stable2509` from AndreiEres.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Andrei Eres <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
EgorPopelyaev pushed a commit that referenced this pull request Oct 3, 2025
Backport #9868 into `stable2506` from AndreiEres.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Andrei Eres <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
bee344 pushed a commit that referenced this pull request Oct 7, 2025
# Description

During statement store benchmarking we experienced deadlock-like
behavior which we found happened during statement propagation. Every
second statements were propagating, locking the index which possibly
caused the deadlock. After the fix, the observed behavior no longer
occurs.

Even though there is a possibility to unsync the DB and the index for
read operations and release locks earlier, which should be harmless, it
leads to regressions. I suspect because of concurrent access to many
calls of db.get(). Checked with the benchmarks in
#9884

## Integration

This PR should not affect downstream projects.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
alvicsam pushed a commit that referenced this pull request Oct 17, 2025
# Description

During statement store benchmarking we experienced deadlock-like
behavior which we found happened during statement propagation. Every
second statements were propagating, locking the index which possibly
caused the deadlock. After the fix, the observed behavior no longer
occurs.

Even though there is a possibility to unsync the DB and the index for
read operations and release locks earlier, which should be harmless, it
leads to regressions. I suspect because of concurrent access to many
calls of db.get(). Checked with the benchmarks in
#9884

## Integration

This PR should not affect downstream projects.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A4-backport-stable2506 Pull request must be backported to the stable2506 release branch A4-backport-stable2509 Pull request must be backported to the stable2509 release branch A4-backport-unstable2507 Pull request must be backported to the unstable2507 release branch T0-node This PR/Issue is related to the topic “node”.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants