Skip to content

Conversation

@karknu
Copy link
Contributor

@karknu karknu commented Nov 28, 2025

Description

Enforce a maximum limit on the number of times we will attempt to promote a peer to warm. Localroot peers, bootstrap relays and manually configured public root peers are exempt from this limit.

Implements #5252

Checklist

Quality

  • Commit sequence makes sense and have useful messages, see ref.
  • New tests are added and existing tests are updated.
  • Self-reviewed the PR.

Maintenance

  • Linked an issue or added the PR to the current sprint of ouroboros-network project.
  • Added labels.
  • Updated changelog files.
  • The documentation has been properly updated, see ref.

@github-project-automation github-project-automation bot moved this to In Progress in Ouroboros Network Nov 28, 2025
@karknu karknu force-pushed the karknu/max_recon_main branch from 9e31084 to 395a527 Compare November 28, 2025 09:46
@karknu karknu added the outbound-governor Issues / PRs related to outbound-governor label Nov 28, 2025
@karknu karknu marked this pull request as ready for review November 28, 2025 10:35
@karknu karknu requested a review from a team as a code owner November 28, 2025 10:35
Copy link
Collaborator

@coot coot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a maximum reconnection attempt limit for peer connections in the Ouroboros Network peer selection governor. Peers that fail to connect repeatedly are now forgotten after exceeding a configurable retry limit, with exceptions for localroot peers, bootstrap relays, and manually configured public root peers which can retry indefinitely.

Key changes:

  • Adds policyMaxConnectionRetries field (default: 5) to PeerSelectionPolicy
  • Implements retry limit enforcement in the cold peer promotion failure handler
  • Adds a forgotten boolean flag to promotion failure trace events to indicate when a peer is removed
  • Updates all trace pattern matches and JSON serialization to handle the new parameter

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ouroboros-network/lib/Ouroboros/Network/PeerSelection/Governor/Types.hs Adds policyMaxConnectionRetries field and forgotten parameter to trace constructors
ouroboros-network/lib/Ouroboros/Network/PeerSelection/Governor/EstablishedPeers.hs Implements retry limit logic to forget peers after max failures, exempting localroots and bootstrap peers
ouroboros-network/lib/Ouroboros/Network/Diffusion/Policies.hs Sets default policyMaxConnectionRetries value to 5
ouroboros-network/orphan-instances/Ouroboros/Network/OrphanInstances.hs Updates JSON serialization to include forgotten field
dmq-node/src/DMQ/Diffusion/PeerSelection.hs Adds policyMaxConnectionRetries to DMQ policy
cardano-diffusion/tests/lib/Test/Cardano/Network/PeerSelection/MockEnvironment.hs Updates test trace patterns for new forgotten parameter
cardano-diffusion/tests/lib/Test/Cardano/Network/PeerSelection.hs Updates test trace pattern matches
cardano-diffusion/tests/lib/Test/Cardano/Network/Diffusion/Testnet.hs Updates test trace pattern matches
ouroboros-network/changelog.d/20251128_094205_karl.fb.knutsson_max_recon_main.md Adds changelog entry
dmq-node/changelog.d/20251128_104246_karl.fb.knutsson_max_recon_main.md Adds changelog entry
cardano-diffusion/changelog.d/20251128_104055_karl.fb.knutsson_max_recon_main.md Adds changelog entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment on lines 518 to 521
unForgetAble = LocalRootPeers.member peeraddr localRootPeers ||
(memberExtraPeers peeraddr (PublicRootPeers.getExtraPeers publicRootPeers))
(publicRootPeers', knownPeers'', forgotten) =
if unForgetAble || failCount < policyMaxConnectionRetries
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The variable name unForgetAble has inconsistent capitalization. Consider using unforgettable or unForgettable for better consistency with Haskell naming conventions (camelCase for local variables).

Suggested change
unForgetAble = LocalRootPeers.member peeraddr localRootPeers ||
(memberExtraPeers peeraddr (PublicRootPeers.getExtraPeers publicRootPeers))
(publicRootPeers', knownPeers'', forgotten) =
if unForgetAble || failCount < policyMaxConnectionRetries
unForgettable = LocalRootPeers.member peeraddr localRootPeers ||
(memberExtraPeers peeraddr (PublicRootPeers.getExtraPeers publicRootPeers))
(publicRootPeers', knownPeers'', forgotten) =
if unForgettable || failCount < policyMaxConnectionRetries

Copilot uses AI. Check for mistakes.
Comment on lines 518 to 531
unForgetAble = LocalRootPeers.member peeraddr localRootPeers ||
(memberExtraPeers peeraddr (PublicRootPeers.getExtraPeers publicRootPeers))
(publicRootPeers', knownPeers'', forgotten) =
if unForgetAble || failCount < policyMaxConnectionRetries
then ( publicRootPeers
, KnownPeers.setConnectTimes (Map.singleton peeraddr (delay `addTime` now))
knownPeers'
, False
)
else ( PublicRootPeers.difference differenceExtraPeers publicRootPeers
(Set.singleton peeraddr)
, KnownPeers.delete (Set.singleton peeraddr) knownPeers'
, True
)
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new behavior of forgetting peers after policyMaxConnectionRetries failed connection attempts lacks test coverage. Consider adding a test that verifies:

  1. Peers that are not localroots or bootstrap peers are forgotten after the maximum number of connection failures
  2. Localroot peers and bootstrap peers are never forgotten regardless of connection failure count
  3. The forgotten flag is correctly set to True in the trace when a peer is forgotten

Copilot uses AI. Check for mistakes.
Enforce a maximum limit on the number of times we will attempt to
promote a peer to warm. Localroot peers, bootstrap relays and manually
configured public root peers are exempt from this limit.
@karknu karknu force-pushed the karknu/max_recon_main branch from 4344590 to 72a8bdb Compare December 1, 2025 08:46
@coot coot moved this from In Progress to In Review in Ouroboros Network Dec 8, 2025
@coot coot moved this from In Review to In Progress in Ouroboros Network Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

outbound-governor Issues / PRs related to outbound-governor

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants