Skip to content

Conversation

@AndreiEres
Copy link
Contributor

@AndreiEres AndreiEres commented Jul 24, 2025

Description

The polkadot_parachain_collation_expired metric is an indicator for parachain block confidence. However, this metric has a critical issue: not every drop should be counted.

Lookahead collators intentionally build collations on a relay chain block and its forks, so the drop of fork-based collations is an expected behaviour. If we count them, the drop metrics show a picture that is worse than in reality. To improve tracking accuracy, we should exclude legit drops

The minor issue is also present in the expiry mechanism. It doesn't take into account that collation was moved to a different stage, e.g., from "fetched" to "backed", and can write a drop of fetched collation.

To solve this issue we should:

  • Track relay parent finalization.
  • Record expiration metrics only when relay parent was finalized.
  • Exclude drops of fork-based collation from the metrics.
  • Send metrics only for collations that either finalized or dropped.

@AndreiEres AndreiEres changed the title [WIP] Fix collation metrics [WIP] Collation metrics: don't report colation drops build on relay chain forks Aug 14, 2025
@AndreiEres AndreiEres changed the title [WIP] Collation metrics: don't report colation drops build on relay chain forks [WIP] Collation metrics: exclude drops of fork-based collations to improve metrics accuracy Aug 14, 2025
@AndreiEres AndreiEres changed the title [WIP] Collation metrics: exclude drops of fork-based collations to improve metrics accuracy Collation metrics: exclude drops of fork-based collations to improve metrics accuracy Aug 15, 2025
@AndreiEres AndreiEres added the T0-node This PR/Issue is related to the topic “node”. label Aug 15, 2025
@AndreiEres
Copy link
Contributor Author

/cmd prdoc --audience node_dev --bump patch

@AndreiEres AndreiEres marked this pull request as ready for review August 15, 2025 13:54
Copy link
Contributor

@Sajjon Sajjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not get it all, but found some wrong doc comment use.

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AndreiEres

/// Returns true if the collation was included in a block before (or in) last finalized.
pub fn is_possibly_finalized(&self, last_finalized: BlockNumber) -> bool {
self.included_at
.map(|included_at| included_at <= last_finalized)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we see some colation included it doesn't really mean it is finalized, because we don't even know if it is on the fork that is getting finalized. However, this should still work fine, because eventually the collation should be backed/included eventually as it was clearly backed offchain. Only issue is that the relay parent expires.

Please add some docs about the limitations of the measurement we are doing.

@AndreiEres AndreiEres enabled auto-merge October 1, 2025 07:10
@AndreiEres AndreiEres requested a review from Sajjon October 1, 2025 07:11
Copy link
Contributor

@Sajjon Sajjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@AndreiEres AndreiEres added this pull request to the merge queue Oct 1, 2025
Merged via the queue into master with commit d51c532 Oct 1, 2025
245 of 247 checks passed
@AndreiEres AndreiEres deleted the AndreiEres-fix-collation-metrics branch October 1, 2025 08:32
@AndreiEres AndreiEres added A4-backport-stable2503 Pull request must be backported to the stable2503 release branch A4-backport-stable2506 Pull request must be backported to the stable2506 release branch A4-backport-unstable2507 Pull request must be backported to the unstable2507 release branch A4-backport-stable2509 Pull request must be backported to the stable2509 release branch labels Oct 1, 2025
@paritytech-release-backport-bot

Created backport PR for stable2503:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-9319-to-stable2503
git worktree add --checkout .worktree/backport-9319-to-stable2503 backport-9319-to-stable2503
cd .worktree/backport-9319-to-stable2503
git reset --hard HEAD^
git cherry-pick -x d51c532f07c7e3307718b95f5a1f8859e14949a0
git push --force-with-lease

@paritytech-release-backport-bot

Created backport PR for stable2506:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-9319-to-stable2506
git worktree add --checkout .worktree/backport-9319-to-stable2506 backport-9319-to-stable2506
cd .worktree/backport-9319-to-stable2506
git reset --hard HEAD^
git cherry-pick -x d51c532f07c7e3307718b95f5a1f8859e14949a0
git push --force-with-lease

paritytech-release-backport-bot bot pushed a commit that referenced this pull request Oct 1, 2025
…metrics accuracy (#9319)

# Description

The polkadot_parachain_collation_expired metric is an indicator for
parachain block confidence. However, this metric has a critical issue:
not every drop should be counted.

Lookahead collators intentionally build collations on a relay chain
block and its forks, so the drop of fork-based collations is an expected
behaviour. If we count them, the drop metrics show a picture that is
worse than in reality. To improve tracking accuracy, we should exclude
legit drops

The minor issue is also present in the expiry mechanism. It doesn't take
into account that collation was moved to a different stage, e.g., from
"fetched" to "backed", and can write a drop of fetched collation.

To solve this issue we should:

- Track relay parent finalization.
- Record expiration metrics only when relay parent was finalized.
- Exclude drops of fork-based collation from the metrics.
- Send metrics only for collations that either finalized or dropped.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit d51c532)
@paritytech-release-backport-bot

Successfully created backport PR for unstable2507:

paritytech-release-backport-bot bot pushed a commit that referenced this pull request Oct 1, 2025
…metrics accuracy (#9319)

# Description

The polkadot_parachain_collation_expired metric is an indicator for
parachain block confidence. However, this metric has a critical issue:
not every drop should be counted.

Lookahead collators intentionally build collations on a relay chain
block and its forks, so the drop of fork-based collations is an expected
behaviour. If we count them, the drop metrics show a picture that is
worse than in reality. To improve tracking accuracy, we should exclude
legit drops

The minor issue is also present in the expiry mechanism. It doesn't take
into account that collation was moved to a different stage, e.g., from
"fetched" to "backed", and can write a drop of fetched collation.

To solve this issue we should:

- Track relay parent finalization.
- Record expiration metrics only when relay parent was finalized.
- Exclude drops of fork-based collation from the metrics.
- Send metrics only for collations that either finalized or dropped.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit d51c532)
@paritytech-release-backport-bot

Successfully created backport PR for stable2509:

@AndreiEres AndreiEres removed A4-backport-stable2503 Pull request must be backported to the stable2503 release branch A4-backport-stable2506 Pull request must be backported to the stable2506 release branch labels Oct 1, 2025
EgorPopelyaev added a commit that referenced this pull request Oct 1, 2025
Backport #9319 into `stable2509` from AndreiEres.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Andrei Eres <[email protected]>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Egor_P <[email protected]>
bee344 pushed a commit that referenced this pull request Oct 7, 2025
…metrics accuracy (#9319)

# Description

The polkadot_parachain_collation_expired metric is an indicator for
parachain block confidence. However, this metric has a critical issue:
not every drop should be counted.

Lookahead collators intentionally build collations on a relay chain
block and its forks, so the drop of fork-based collations is an expected
behaviour. If we count them, the drop metrics show a picture that is
worse than in reality. To improve tracking accuracy, we should exclude
legit drops

The minor issue is also present in the expiry mechanism. It doesn't take
into account that collation was moved to a different stage, e.g., from
"fetched" to "backed", and can write a drop of fetched collation.

To solve this issue we should:

- Track relay parent finalization. 
- Record expiration metrics only when relay parent was finalized. 
- Exclude drops of fork-based collation from the metrics. 
- Send metrics only for collations that either finalized or dropped.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
alvicsam pushed a commit that referenced this pull request Oct 17, 2025
…metrics accuracy (#9319)

# Description

The polkadot_parachain_collation_expired metric is an indicator for
parachain block confidence. However, this metric has a critical issue:
not every drop should be counted.

Lookahead collators intentionally build collations on a relay chain
block and its forks, so the drop of fork-based collations is an expected
behaviour. If we count them, the drop metrics show a picture that is
worse than in reality. To improve tracking accuracy, we should exclude
legit drops

The minor issue is also present in the expiry mechanism. It doesn't take
into account that collation was moved to a different stage, e.g., from
"fetched" to "backed", and can write a drop of fetched collation.

To solve this issue we should:

- Track relay parent finalization. 
- Record expiration metrics only when relay parent was finalized. 
- Exclude drops of fork-based collation from the metrics. 
- Send metrics only for collations that either finalized or dropped.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A4-backport-stable2509 Pull request must be backported to the stable2509 release branch A4-backport-unstable2507 Pull request must be backported to the unstable2507 release branch T0-node This PR/Issue is related to the topic “node”.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants