Skip to content

Connection Pool Slots#1266

Open
LFRezende wants to merge 9 commits intomasterfrom
feat/poc-connectionPoolSlots-discoveredFactory
Open

Connection Pool Slots#1266
LFRezende wants to merge 9 commits intomasterfrom
feat/poc-connectionPoolSlots-discoveredFactory

Conversation

@LFRezende
Copy link

Motivation

The current status of Hathor's P2P network doesn't distinguish the different types of connections established between two peers, such that the base code only limits the number of connections established with a given full-node, regardless of type. A malicious attacker could jeopardize the system by creating multiple connections with a given node, jamming its capability to respond and/or create new connections.

This PR implements Connection Pool Slots, subdividing the existing pool into four main groups:

  • Outgoing Connections Slot (full node --> peer)
  • Incoming Connections Slot (peer --> full node)
  • Discovered Connections Slot: Outgoing connections established in bootstrap and/or by DNS query
  • Check Entrypoints: When the outgoing slot is full, the impeding connections are transferred to the Check Entrypoints Slot, which will analyze the protocols and their entrypoints, checking whether they are legit peers.

Acceptance Criteria

  • Outgoing, Incoming, Discovered, and Check Entrypoint slots each with a set limit of connections
  • Check Entrypoint slot with queue to store entrypoints for future attempt to connect

Checklist

If this review is accepted, the code seems to be production-ready.

@github-actions
Copy link

github-actions bot commented May 2, 2025

🐰 Bencher Report

Branchfeat/poc-connectionPoolSlots-discoveredFactory
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
minutes (m)
(Result Δ%)
Lower Boundary
minutes (m)
(Limit %)
Upper Boundary
minutes (m)
(Limit %)
sync-v2 (up to 20000 blocks)📈 view plot
🚷 view threshold
1.57 m
(-8.14%)Baseline: 1.71 m
1.54 m
(97.97%)
2.05 m
(76.55%)
🐰 View full continuous benchmarking report in Bencher

@codecov
Copy link

codecov bot commented May 3, 2025

Codecov Report

❌ Patch coverage is 70.49180% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.60%. Comparing base (735f4e0) to head (2591ea8).

Files with missing lines Patch % Lines
hathor/p2p/connection_slot.py 65.55% 24 Missing and 7 partials ⚠️
hathor/p2p/manager.py 67.60% 17 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1266      +/-   ##
==========================================
- Coverage   85.67%   85.60%   -0.08%     
==========================================
  Files         439      440       +1     
  Lines       33618    33780     +162     
  Branches     5282     5322      +40     
==========================================
+ Hits        28802    28916     +114     
- Misses       3801     3838      +37     
- Partials     1015     1026      +11     
Flag Coverage Δ
test-lib 85.60% <70.49%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LFRezende LFRezende moved this to In Progress (Done) in Hathor Network May 3, 2025
@LFRezende LFRezende self-assigned this May 3, 2025
@LFRezende LFRezende moved this from In Progress (Done) to Parking Lot in Hathor Network Jan 28, 2026
@msbrogli msbrogli force-pushed the master branch 2 times, most recently from eb416fa to 21d7909 Compare February 12, 2026 23:12
@LFRezende LFRezende force-pushed the feat/poc-connectionPoolSlots-discoveredFactory branch from e3e0f0e to 5d17004 Compare February 14, 2026 05:01
PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5

# Queue size for each connection slot:
QUEUE_SIZE: int = 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add P2P_ prefix to all these new settings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name changed for these settings:

    # Max Number of each connection slot (int):
    P2P_PEER_MAX_INCOMING_CONNECTIONS: int = 40
    P2P_PEER_MAX_OUTGOING_CONNECTIONS: int = 40
    P2P_PEER_MAX_DISCOVERED_PEERS_CONNECTIONS: int = 10
    P2P_PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5

    # Queue size for each connection slot:
    P2P_QUEUE_SIZE: int = 100

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One new question, though: The following setting:

337:  # Maximum number of entrypoints that we accept that a peer broadcasts
    PEER_MAX_ENTRYPOINTS: int = 30

Wouldn't it cap the maximum number of incoming connections to thirty? Making

    # Max Number of each connection slot (int):
    P2P_PEER_MAX_INCOMING_CONNECTIONS: int = 40

Irrelevant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are (very) different settings. PEER_MAX_ENTRYPOINTS allows each peer id to have up to 30 entrypoints; while P2P_PEER_MAX_INCOMING_CONNECTIONS allows a maximum of 40 connections from different peers.

type: HathorProtocol.ConnectionType
max_slot_connections: int
queue_size_entrypoints: int
entrypoint_set: set[PeerAddress | None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? Is it to manage the fact that a peer might have more than one entrypoint? If yes, why not keep it separate in the Peer class? The p2p manager have access to both and can handle reconnections easily.

Copy link
Author

@LFRezende LFRezende Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean why having both queue and entrypoint set?
I meant, at the time, to have both so we can know when one protocol is revisiting the function or not.

The animation of https://github.com/HathorNetwork/sec-rfcs/blob/feat/connection-pool-slots/text/000x-connection-pool-slots.md explains it well:

When we receive a connection from "add_connection", we grab the entrypoints and add them both to the queue, to be analyzed later by check_entrypoints if needed, and the entrypoints set.

The reason for having both, as said in the sec-rfc, is:


Note on Check Entrypoints Type
Since this is the only type to store entrypoints in queues, the Slot class works such that, if this is the type, all arriving connections to this slot are stored in the connections set.

When the connection mature to READY, a check is rolled to see whether this connection has its peer entrypoint in the entrypoints_set of the slot. If it does, then this connection comes from a previously dequeued entrypoint from the entrypoint_queue of the slot, so we simply disconnect it. However, if this is not the case, then we grab the entrypoints provided by the connection protocol and add them to the queue and the entrypoint set, and then disconnecting the protocol.

The disconnection of the protocol mentioned aboved, regardless of its nature, leads to a free spot in the set, always. This triggers the queue to pop one of the entrypoints, which will then be used to connect to a peer.

This is the best opportunity to explain a question that the reader may have: why does the slot have two data structures to store the same entrypoints?

In the process of popping an entrypoint from the queue and establishing a connection to its peer - a dequeued connection - a new and different connection may arrive while the dequeued connection is still coming. If this occurs in a scenario of almost full slot (only one vacant spot in the slot), this new connection will take the place "reserved" for the dequeued connection. When the slot is full, we simply discard the connection, so this leads to the dequeued entrypoint being lost and never analyzed.

To prevent this, when the connection arrives and the slot is nearly full, a check is made to see if the length of the entrypoint set is the same as the entrypoint queue. If it is, then no dequeued connection is being processed. If they don't match, then at least one spot must be reserved for a dequeued connection that is coming.


If too convoluted or unnecessary, I can refactor it. What do you think?

Comment on lines 113 to 114
if protocol in self.connection_slot:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this only happen in case of a bug? If yes, it should be an assert.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily a bug, as an assert would revert the function. It'd been thought as a way to avoid reconnection/adding twice the same protocol to the same slot.

If not necessary, I can delete it.


# The connection must be turned into CHECK_ENTRYPOINTS.
# Will return to on_peer_connect and slot it into check_entrypoints.
protocol.connection_type = HathorProtocol.ConnectionType.CHECK_ENTRYPOINTS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing the connection type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous meetings, we talked about when would we fire a "check entrypoint" type of connection to check other entrypoints.

The main criteria we had established at the time was: if the outgoing slot is full, we need to check why this is happening. Hence, if the slot is full, we establish a check-entrypoint type of connection.

Therefore, in the outgoing case, if it becomes full, we turn it into a check-entrypoints kind of connection.

A useful resource for understanding the ideas at the time of the connection-pool-slots is:

https://github.com/HathorNetwork/sec-rfcs/blob/feat/connection-pool-slots/text/000x-connection-pool-slots.md

It is not revised yet and outdated, but the check-entrypoints portion is up to date, and there are animations to illustrate better the inner workings.

@github-project-automation github-project-automation bot moved this from Parking Lot to In Review (WIP) in Hathor Network Feb 18, 2026
@LFRezende LFRezende moved this from In Review (WIP) to In Progress (WIP) in Hathor Network Feb 19, 2026

@override
async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None:
async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callable flags optional arguments even if their typing is stated. For example, we cannot write "[PeerEndpoint, Bool]" in the argument types to include the parameter "discovery_call = True". We can either tell the linter to ignore the typing problem or change it to allow generic arguments in the typing (...).

I'll talk to Glevco to see what's the best approach.


@override
async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None:
async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Comment on lines -26 to +24
async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None:
async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Comment on lines +63 to +66
max_outgoing: int = settings.P2P_PEER_MAX_OUTGOING_CONNECTIONS
max_incoming: int = settings.P2P_PEER_MAX_INCOMING_CONNECTIONS
max_discovered: int = settings.P2P_PEER_MAX_DISCOVERED_PEERS_CONNECTIONS
max_check_ep: int = settings.P2P_PEER_MAX_CHECK_PEER_CONNECTIONS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get from init(). The builder should inject these values from settings (or elsewhere).

connection_status: ConnectionResult

if protocol in self.connection_slot:
return ConnectionRejected("Protocol already in Slot.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the error message should be simply "already connected".

Comment on lines +131 to +132
connection_status = ConnectionAllowed(f"Type {self.type} added, slot length: {len(self.connection_slot)}")
return connection_status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a success message?

return ConnectionChanged("Outgoing -> Check Entrypoints")

# Check_EP is disconnected too, as we only queue endpoints of ready/valid peers.
protocol.disconnect(reason="Connection Slot if full. Try again later.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should call disconnect from the p2p manager. So this method can be easily tested.

Comment on lines +565 to +571
while dequeued_ep:
for peer in self.verified_peer_storage.values():
if dequeued_ep in peer.info.entrypoints:
dequeued_ep = self.check_entrypoints_slot.remove_connection(protocol, True, dequeued_ep)
break
if dequeued_ep and dequeued_ep not in peer.info.entrypoints:
self.connect_to_endpoint(entrypoint=dequeued_ep.with_id(None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really cautious with while loops. Why is it needed?

PEER_ID = PeerIdState
READY = ReadyState

class ConnectionType(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no value in keeping them "inside HathorProtocol". I guess it just make imports more complicated because of cycles. I'd say a hathor/p2p/types.py is the best place for it to live. The same would apply for the other classes in this file, but that's legacy code.

PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5

# Queue size for each connection slot:
QUEUE_SIZE: int = 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are (very) different settings. PEER_MAX_ENTRYPOINTS allows each peer id to have up to 30 entrypoints; while P2P_PEER_MAX_INCOMING_CONNECTIONS allows a maximum of 40 connections from different peers.

Comment on lines 553 to 565
if protocol.connection_type == HathorProtocol.ConnectionType.OUTGOING:
# Here, it can happend that the protocol changes to Check Entrypoints.
protocol_connected = self.outgoing_slot.add_connection(protocol)
# The check is done so discovered connections are not added doubly.

if protocol.connection_type == HathorProtocol.ConnectionType.INCOMING:
protocol_connected = self.incoming_slot.add_connection(protocol)

if protocol.connection_type == HathorProtocol.ConnectionType.DISCOVERED:
protocol_connected = self.discovered_slot.add_connection(protocol)

if protocol.connection_type == HathorProtocol.ConnectionType.CHECK_ENTRYPOINTS:
protocol_connected = self.check_entrypoints_slot.add_connection(protocol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hathor-core already works only for python 3.11+. No worries!

@github-project-automation github-project-automation bot moved this from In Progress (WIP) to In Review (WIP) in Hathor Network Feb 23, 2026
@LFRezende LFRezende moved this from In Review (WIP) to Todo in Hathor Network Feb 23, 2026
@LFRezende LFRezende moved this from Todo to In Progress (WIP) in Hathor Network Feb 23, 2026
LFRezende and others added 8 commits February 24, 2026 14:20
Subdivide the p2p connection pool into four slot types:
- OUTGOING: connections our node initiates
- INCOMING: connections other nodes initiate to us
- DISCOVERED: connections from peer discovery (DNS/bootstrap)
- CHECK_ENTRYPOINTS: temporary connections to verify peer entrypoints

Add Slot class with connection management, queue support, and limits.
Add HathorDiscoveredFactory for discovered connections.
Add ConnectionType and ConnectionState enums to HathorProtocol.
Add per-slot settings (max connections, queue size).
Add tests for slot updates, limits, check_ep, and overflow.
- Replace fragile `not protocol.connection_type` with explicit enum
  check in get_connection_to_drop, including all outbound types
  (OUTGOING, DISCOVERED, CHECK_ENTRYPOINTS)
- Separate PEER_MAX_INCOMING_CONNECTIONS (slot limit) from
  PEER_MAX_ENTRYPOINTS (peer broadcast limit) to avoid repurposing
- Restore original PEER_MAX_ENTRYPOINTS (30) for entrypoint validation
- Use keyword args for HathorSettings in tests for resilience

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@LFRezende LFRezende force-pushed the feat/poc-connectionPoolSlots-discoveredFactory branch from bd6e54a to f4058b7 Compare February 24, 2026 17:23
…n_peer_connect and changing connection type mechanism for check_ep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress (WIP)

Development

Successfully merging this pull request may close these issues.

2 participants