Connection Pool Slots by LFRezende · Pull Request #1266 · HathorNetwork/hathor-core

LFRezende · 2025-05-02T21:40:22Z

Motivation

The current status of Hathor's P2P network doesn't distinguish the different types of connections established between two peers, such that the base code only limits the number of connections established with a given full-node, regardless of type. A malicious attacker could jeopardize the system by creating multiple connections with a given node, jamming its capability to respond and/or create new connections.

This PR implements Connection Pool Slots, subdividing the existing pool into four main groups:

Outgoing Connections Slot (full node --> peer)
Incoming Connections Slot (peer --> full node)
Discovered Connections Slot: Outgoing connections established in bootstrap and/or by DNS query
Check Entrypoints: When the outgoing slot is full, the impeding connections are transferred to the Check Entrypoints Slot, which will analyze the protocols and their entrypoints, checking whether they are legit peers.

Acceptance Criteria

Outgoing, Incoming, Discovered, and Check Entrypoint slots each with a set limit of connections
Check Entrypoint slot with queue to store entrypoints for future attempt to connect

Checklist

If this review is accepted, the code seems to be production-ready.

github-actions · 2025-05-02T21:48:04Z

Bencher Report

Branch	feat/poc-connectionPoolSlots-discoveredFactory
Testbed	ubuntu-22.04

Click to view all benchmark results

Benchmark	Latency	Benchmark Result minutes (m) (Result Δ%)	Lower Boundary minutes (m) (Limit %)	Upper Boundary minutes (m) (Limit %)
sync-v2 (up to 20000 blocks)	📈 view plot 🚷 view threshold	1.57 m (-8.14%) Baseline: 1.71 m	1.54 m (97.97%)	2.05 m (76.55%)

🐰 View full continuous benchmarking report in Bencher

codecov · 2025-05-03T01:04:20Z

Codecov Report

❌ Patch coverage is 70.49180% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.60%. Comparing base (735f4e0) to head (2591ea8).

Files with missing lines	Patch %	Lines
hathor/p2p/connection_slot.py	65.55%	24 Missing and 7 partials ⚠️
hathor/p2p/manager.py	67.60%	17 Missing and 6 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1266      +/-   ##
==========================================
- Coverage   85.67%   85.60%   -0.08%     
==========================================
  Files         439      440       +1     
  Lines       33618    33780     +162     
  Branches     5282     5322      +40     
==========================================
+ Hits        28802    28916     +114     
- Misses       3801     3838      +37     
- Partials     1015     1026      +11

Flag	Coverage Δ
test-lib	`85.60% <70.49%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hathor/conf/settings.py

msbrogli · 2026-02-18T17:59:42Z

hathor/conf/settings.py

+    PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5
+
+    # Queue size for each connection slot:
+    QUEUE_SIZE: int = 100


Maybe we should add P2P_ prefix to all these new settings.

Name changed for these settings:

# Max Number of each connection slot (int): P2P_PEER_MAX_INCOMING_CONNECTIONS: int = 40 P2P_PEER_MAX_OUTGOING_CONNECTIONS: int = 40 P2P_PEER_MAX_DISCOVERED_PEERS_CONNECTIONS: int = 10 P2P_PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5 # Queue size for each connection slot: P2P_QUEUE_SIZE: int = 100

One new question, though: The following setting:

337: # Maximum number of entrypoints that we accept that a peer broadcasts PEER_MAX_ENTRYPOINTS: int = 30

Wouldn't it cap the maximum number of incoming connections to thirty? Making

# Max Number of each connection slot (int): P2P_PEER_MAX_INCOMING_CONNECTIONS: int = 40

Irrelevant?

These are (very) different settings. PEER_MAX_ENTRYPOINTS allows each peer id to have up to 30 entrypoints; while P2P_PEER_MAX_INCOMING_CONNECTIONS allows a maximum of 40 connections from different peers.

hathor/p2p/peer_discovery/bootstrap.py

hathor/p2p/peer_discovery/dns.py

hathor/p2p/factory.py

msbrogli · 2026-02-18T18:23:42Z

hathor/p2p/manager.py

+    type: HathorProtocol.ConnectionType
+    max_slot_connections: int
+    queue_size_entrypoints: int
+    entrypoint_set: set[PeerAddress | None]


Why this? Is it to manage the fact that a peer might have more than one entrypoint? If yes, why not keep it separate in the Peer class? The p2p manager have access to both and can handle reconnections easily.

You mean why having both queue and entrypoint set?
I meant, at the time, to have both so we can know when one protocol is revisiting the function or not.

The animation of https://github.com/HathorNetwork/sec-rfcs/blob/feat/connection-pool-slots/text/000x-connection-pool-slots.md explains it well:

When we receive a connection from "add_connection", we grab the entrypoints and add them both to the queue, to be analyzed later by check_entrypoints if needed, and the entrypoints set.

The reason for having both, as said in the sec-rfc, is:

Note on Check Entrypoints Type
Since this is the only type to store entrypoints in queues, the Slot class works such that, if this is the type, all arriving connections to this slot are stored in the connections set.

When the connection mature to READY, a check is rolled to see whether this connection has its peer entrypoint in the entrypoints_set of the slot. If it does, then this connection comes from a previously dequeued entrypoint from the entrypoint_queue of the slot, so we simply disconnect it. However, if this is not the case, then we grab the entrypoints provided by the connection protocol and add them to the queue and the entrypoint set, and then disconnecting the protocol.

The disconnection of the protocol mentioned aboved, regardless of its nature, leads to a free spot in the set, always. This triggers the queue to pop one of the entrypoints, which will then be used to connect to a peer.

This is the best opportunity to explain a question that the reader may have: why does the slot have two data structures to store the same entrypoints?

In the process of popping an entrypoint from the queue and establishing a connection to its peer - a dequeued connection - a new and different connection may arrive while the dequeued connection is still coming. If this occurs in a scenario of almost full slot (only one vacant spot in the slot), this new connection will take the place "reserved" for the dequeued connection. When the slot is full, we simply discard the connection, so this leads to the dequeued entrypoint being lost and never analyzed.

To prevent this, when the connection arrives and the slot is nearly full, a check is made to see if the length of the entrypoint set is the same as the entrypoint queue. If it is, then no dequeued connection is being processed. If they don't match, then at least one spot must be reserved for a dequeued connection that is coming.

If too convoluted or unnecessary, I can refactor it. What do you think?

msbrogli · 2026-02-18T18:24:42Z

hathor/p2p/manager.py

+        if protocol in self.connection_slot:
+            return False


Would this only happen in case of a bug? If yes, it should be an assert.

Not necessarily a bug, as an assert would revert the function. It'd been thought as a way to avoid reconnection/adding twice the same protocol to the same slot.

If not necessary, I can delete it.

msbrogli · 2026-02-18T18:25:37Z

hathor/p2p/manager.py

+
+                # The connection must be turned into CHECK_ENTRYPOINTS.
+                # Will return to on_peer_connect and slot it into check_entrypoints.
+                protocol.connection_type = HathorProtocol.ConnectionType.CHECK_ENTRYPOINTS


Why are you changing the connection type?

In previous meetings, we talked about when would we fire a "check entrypoint" type of connection to check other entrypoints.

The main criteria we had established at the time was: if the outgoing slot is full, we need to check why this is happening. Hence, if the slot is full, we establish a check-entrypoint type of connection.

Therefore, in the outgoing case, if it becomes full, we turn it into a check-entrypoints kind of connection.

A useful resource for understanding the ideas at the time of the connection-pool-slots is:

https://github.com/HathorNetwork/sec-rfcs/blob/feat/connection-pool-slots/text/000x-connection-pool-slots.md

It is not revised yet and outdated, but the check-entrypoints portion is up to date, and there are animations to illustrate better the inner workings.

hathor/p2p/manager.py

hathor_tests/p2p/test_sync_v2.py

msbrogli · 2026-02-23T14:41:34Z

hathor/p2p/peer_discovery/bootstrap.py


    @override
-    async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None:
+    async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None:


Why this change?

Callable flags optional arguments even if their typing is stated. For example, we cannot write "[PeerEndpoint, Bool]" in the argument types to include the parameter "discovery_call = True". We can either tell the linter to ignore the typing problem or change it to allow generic arguments in the typing (...).

I'll talk to Glevco to see what's the best approach.

msbrogli · 2026-02-23T14:41:45Z

hathor/p2p/peer_discovery/dns.py


    @override
-    async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None:
+    async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None:


Why this change?

See comment above.

msbrogli · 2026-02-23T14:41:56Z

hathor/p2p/peer_discovery/peer_discovery.py

-    async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None:
+    async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None:


Why this change?

See comment above.

msbrogli · 2026-02-23T14:42:50Z

hathor/p2p/connection_slot.py

+        max_outgoing: int = settings.P2P_PEER_MAX_OUTGOING_CONNECTIONS
+        max_incoming: int = settings.P2P_PEER_MAX_INCOMING_CONNECTIONS
+        max_discovered: int = settings.P2P_PEER_MAX_DISCOVERED_PEERS_CONNECTIONS
+        max_check_ep: int = settings.P2P_PEER_MAX_CHECK_PEER_CONNECTIONS


Get from init(). The builder should inject these values from settings (or elsewhere).

msbrogli · 2026-02-23T14:44:16Z

hathor/p2p/connection_slot.py

+        connection_status: ConnectionResult
+
+        if protocol in self.connection_slot:
+            return ConnectionRejected("Protocol already in Slot.")


Maybe the error message should be simply "already connected".

msbrogli · 2026-02-23T14:46:04Z

hathor/p2p/connection_slot.py

+        connection_status = ConnectionAllowed(f"Type {self.type} added, slot length: {len(self.connection_slot)}")
+        return connection_status


Why a success message?

msbrogli · 2026-02-23T14:48:42Z

hathor/p2p/connection_slot.py

+                return ConnectionChanged("Outgoing -> Check Entrypoints")
+
+            # Check_EP is disconnected too, as we only queue endpoints of ready/valid peers.
+            protocol.disconnect(reason="Connection Slot if full. Try again later.")


You should call disconnect from the p2p manager. So this method can be easily tested.

msbrogli · 2026-02-23T14:50:37Z

hathor/p2p/manager.py

+                while dequeued_ep:
+                    for peer in self.verified_peer_storage.values():
+                        if dequeued_ep in peer.info.entrypoints:
+                            dequeued_ep = self.check_entrypoints_slot.remove_connection(protocol, True, dequeued_ep)
+                            break
+                    if dequeued_ep and dequeued_ep not in peer.info.entrypoints:
+                        self.connect_to_endpoint(entrypoint=dequeued_ep.with_id(None))


I'm really cautious with while loops. Why is it needed?

msbrogli · 2026-02-23T14:53:40Z

hathor/p2p/protocol.py

        PEER_ID = PeerIdState
        READY = ReadyState

+    class ConnectionType(Enum):


I see no value in keeping them "inside HathorProtocol". I guess it just make imports more complicated because of cycles. I'd say a hathor/p2p/types.py is the best place for it to live. The same would apply for the other classes in this file, but that's legacy code.

msbrogli · 2026-02-23T14:55:39Z

hathor/conf/settings.py

+    PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5
+
+    # Queue size for each connection slot:
+    QUEUE_SIZE: int = 100


These are (very) different settings. PEER_MAX_ENTRYPOINTS allows each peer id to have up to 30 entrypoints; while P2P_PEER_MAX_INCOMING_CONNECTIONS allows a maximum of 40 connections from different peers.

msbrogli · 2026-02-23T14:59:22Z

hathor/p2p/manager.py

+        if protocol.connection_type == HathorProtocol.ConnectionType.OUTGOING:
+            # Here, it can happend that the protocol changes to Check Entrypoints.
+            protocol_connected = self.outgoing_slot.add_connection(protocol)
+            # The check is done so discovered connections are not added doubly.
+
+        if protocol.connection_type == HathorProtocol.ConnectionType.INCOMING:
+            protocol_connected = self.incoming_slot.add_connection(protocol)
+
+        if protocol.connection_type == HathorProtocol.ConnectionType.DISCOVERED:
+            protocol_connected = self.discovered_slot.add_connection(protocol)
+
+        if protocol.connection_type == HathorProtocol.ConnectionType.CHECK_ENTRYPOINTS:
+            protocol_connected = self.check_entrypoints_slot.add_connection(protocol)


hathor-core already works only for python 3.11+. No worries!

Subdivide the p2p connection pool into four slot types: - OUTGOING: connections our node initiates - INCOMING: connections other nodes initiate to us - DISCOVERED: connections from peer discovery (DNS/bootstrap) - CHECK_ENTRYPOINTS: temporary connections to verify peer entrypoints Add Slot class with connection management, queue support, and limits. Add HathorDiscoveredFactory for discovered connections. Add ConnectionType and ConnectionState enums to HathorProtocol. Add per-slot settings (max connections, queue size). Add tests for slot updates, limits, check_ep, and overflow.

- Replace fragile `not protocol.connection_type` with explicit enum check in get_connection_to_drop, including all outbound types (OUTGOING, DISCOVERED, CHECK_ENTRYPOINTS) - Separate PEER_MAX_INCOMING_CONNECTIONS (slot limit) from PEER_MAX_ENTRYPOINTS (peer broadcast limit) to avoid repurposing - Restore original PEER_MAX_ENTRYPOINTS (30) for entrypoint validation - Use keyword args for HathorSettings in tests for resilience Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ssment

…n_peer_connect and changing connection type mechanism for check_ep

LFRezende requested review from jansegre and msbrogli as code owners May 2, 2025 21:40

LFRezende moved this to In Progress (Done) in Hathor Network May 3, 2025

LFRezende added this to Hathor Network May 3, 2025

LFRezende self-assigned this May 3, 2025

LFRezende moved this from In Progress (Done) to Parking Lot in Hathor Network Jan 28, 2026

msbrogli force-pushed the master branch 2 times, most recently from eb416fa to 21d7909 Compare February 12, 2026 23:12

LFRezende force-pushed the feat/poc-connectionPoolSlots-discoveredFactory branch from e3e0f0e to 5d17004 Compare February 14, 2026 05:01

msbrogli requested changes Feb 18, 2026

View reviewed changes

github-project-automation bot moved this from Parking Lot to In Review (WIP) in Hathor Network Feb 18, 2026

LFRezende moved this from In Review (WIP) to In Progress (WIP) in Hathor Network Feb 19, 2026

msbrogli reviewed Feb 23, 2026

View reviewed changes

msbrogli requested changes Feb 23, 2026

View reviewed changes

github-project-automation bot moved this from In Progress (WIP) to In Review (WIP) in Hathor Network Feb 23, 2026

LFRezende moved this from In Review (WIP) to Todo in Hathor Network Feb 23, 2026

LFRezende moved this from Todo to In Progress (WIP) in Hathor Network Feb 23, 2026

LFRezende and others added 8 commits February 24, 2026 14:20

refactor(p2p) - connection_slot modularized

6b40081

Update: Linting and Import fixes

7000c35

Update: Used match for connection_type in connection_slot and manager

fbac75e

Add: ConnectionAllowed/Rejected classes and reason for disconnection

f1129b3

Update: Refactor anf Flaking

4fb0ec3

Fix(hathor-core): Linting and Flaking on name refactors and type asse…

f4058b7

…ssment

LFRezende force-pushed the feat/poc-connectionPoolSlots-discoveredFactory branch from bd6e54a to f4058b7 Compare February 24, 2026 17:23

fix(test_p2p): tests on check_ep_overflow and slot limit, update on o…

2591ea8

…n_peer_connect and changing connection type mechanism for check_ep

		async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None:
		async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None:

		connection_status = ConnectionAllowed(f"Type {self.type} added, slot length: {len(self.connection_slot)}")
		return connection_status

Conversation

LFRezende commented May 2, 2025

Motivation

Acceptance Criteria

Checklist

Uh oh!

github-actions bot commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LFRezende Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented May 2, 2025 •

edited

Loading

codecov bot commented May 3, 2025 •

edited

Loading

LFRezende Feb 23, 2026 •

edited

Loading