Skip to content

feat: retry heights when adding proposal#1724

Closed
rach-id wants to merge 4 commits intofeature/recoveryfrom
rachid/recovery/retry-on-add-commitment
Closed

feat: retry heights when adding proposal#1724
rach-id wants to merge 4 commits intofeature/recoveryfrom
rachid/recovery/retry-on-add-commitment

Conversation

@rach-id
Copy link
Copy Markdown
Member

@rach-id rach-id commented Apr 11, 2025

Description

Retries heights when adding a proposal while having a longer catchup frequency of 30s.

Also, retries heights the moment the propagation reactor starts processing in case it received any commitment during block/state sync.

PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use
    unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments

@rach-id rach-id self-assigned this Apr 11, 2025
@rach-id rach-id requested a review from a team as a code owner April 11, 2025 19:03
@rach-id rach-id added this to core/app Apr 11, 2025
@rach-id rach-id requested review from evan-forbes and rootulp and removed request for a team April 11, 2025 19:03
@github-project-automation github-project-automation Bot moved this to Needs Triage in core/app Apr 11, 2025
@rach-id rach-id moved this from Needs Triage to In Progress in core/app Apr 11, 2025
@celestia-bot celestia-bot requested a review from tzdybal April 11, 2025 19:03
@rach-id rach-id marked this pull request as draft April 11, 2025 19:03
@rach-id rach-id requested review from Copilot and removed request for rootulp April 11, 2025 19:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

consensus/propagation/reactor.go:91

  • Changing the ticker frequency from 2 seconds to 30 seconds is a significant update; ensure that this change aligns with the intended catchup behavior and that tests appropriately cover the new timing.
ticker := time.NewTicker(30 * time.Second)

consensus/propagation/have_wants.go:35

  • [nitpick] Commenting out the error log may hide useful diagnostic information when a have part for an unknown proposal is received; consider including a debug log or a brief comment explaining the rationale for silencing this error.
//blockProp.Logger.Error("received have part for unknown proposal", "peer", peer, "height", height, "round", round)

Comment on lines +27 to +34
func (blockProp *Reactor) retryWants(height int64, round int32) {
if !blockProp.started.Load() {
return
}
data := blockProp.unfinishedHeights()
peers := blockProp.getPeers()
for _, prop := range data {
height, round := prop.compactBlock.Proposal.Height, prop.compactBlock.Proposal.Round

// don't re-request parts for any round on the current height
if height == currentHeight {
continue
for {
blockProp.Logger.Debug("retrying", "height", height, "round", round)
Copy link

Copilot AI Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new retryWants function uses an infinite loop with a sleep delay, which may lead to unbounded retries if the missing parts never resolve. Consider adding a maximum retry limit or timeout to prevent potential resource exhaustion.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address afterwards

// this call simulates getting a commitment for a proposal of a higher
// height
n2.retryWants(2)
n2.retryWants(prop.Height, prop.Round)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be changed to better tests where we try:

  • catchup multiple heights at the same time
  • retry the same height

@rach-id rach-id closed this Apr 30, 2025
@github-project-automation github-project-automation Bot moved this from In Progress to Done in core/app Apr 30, 2025
@rach-id rach-id deleted the rachid/recovery/retry-on-add-commitment branch July 7, 2025 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants