Conversation
|
| Branch | feat/poc-connectionPoolSlots-discoveredFactory |
| Testbed | ubuntu-22.04 |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result minutes (m) (Result Δ%) | Lower Boundary minutes (m) (Limit %) | Upper Boundary minutes (m) (Limit %) |
|---|---|---|---|---|
| sync-v2 (up to 20000 blocks) | 📈 view plot 🚷 view threshold | 1.57 m(-8.14%)Baseline: 1.71 m | 1.54 m (97.97%) | 2.05 m (76.55%) |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1266 +/- ##
==========================================
- Coverage 85.67% 85.60% -0.08%
==========================================
Files 439 440 +1
Lines 33618 33780 +162
Branches 5282 5322 +40
==========================================
+ Hits 28802 28916 +114
- Misses 3801 3838 +37
- Partials 1015 1026 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
eb416fa to
21d7909
Compare
e3e0f0e to
5d17004
Compare
hathor/conf/settings.py
Outdated
| PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5 | ||
|
|
||
| # Queue size for each connection slot: | ||
| QUEUE_SIZE: int = 100 |
There was a problem hiding this comment.
Maybe we should add P2P_ prefix to all these new settings.
There was a problem hiding this comment.
Name changed for these settings:
# Max Number of each connection slot (int):
P2P_PEER_MAX_INCOMING_CONNECTIONS: int = 40
P2P_PEER_MAX_OUTGOING_CONNECTIONS: int = 40
P2P_PEER_MAX_DISCOVERED_PEERS_CONNECTIONS: int = 10
P2P_PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5
# Queue size for each connection slot:
P2P_QUEUE_SIZE: int = 100
There was a problem hiding this comment.
One new question, though: The following setting:
337: # Maximum number of entrypoints that we accept that a peer broadcasts
PEER_MAX_ENTRYPOINTS: int = 30
Wouldn't it cap the maximum number of incoming connections to thirty? Making
# Max Number of each connection slot (int):
P2P_PEER_MAX_INCOMING_CONNECTIONS: int = 40
Irrelevant?
There was a problem hiding this comment.
These are (very) different settings. PEER_MAX_ENTRYPOINTS allows each peer id to have up to 30 entrypoints; while P2P_PEER_MAX_INCOMING_CONNECTIONS allows a maximum of 40 connections from different peers.
hathor/p2p/manager.py
Outdated
| type: HathorProtocol.ConnectionType | ||
| max_slot_connections: int | ||
| queue_size_entrypoints: int | ||
| entrypoint_set: set[PeerAddress | None] |
There was a problem hiding this comment.
Why this? Is it to manage the fact that a peer might have more than one entrypoint? If yes, why not keep it separate in the Peer class? The p2p manager have access to both and can handle reconnections easily.
There was a problem hiding this comment.
You mean why having both queue and entrypoint set?
I meant, at the time, to have both so we can know when one protocol is revisiting the function or not.
The animation of https://github.com/HathorNetwork/sec-rfcs/blob/feat/connection-pool-slots/text/000x-connection-pool-slots.md explains it well:
When we receive a connection from "add_connection", we grab the entrypoints and add them both to the queue, to be analyzed later by check_entrypoints if needed, and the entrypoints set.
The reason for having both, as said in the sec-rfc, is:
Note on Check Entrypoints Type
Since this is the only type to store entrypoints in queues, the Slot class works such that, if this is the type, all arriving connections to this slot are stored in the connections set.
When the connection mature to READY, a check is rolled to see whether this connection has its peer entrypoint in the entrypoints_set of the slot. If it does, then this connection comes from a previously dequeued entrypoint from the entrypoint_queue of the slot, so we simply disconnect it. However, if this is not the case, then we grab the entrypoints provided by the connection protocol and add them to the queue and the entrypoint set, and then disconnecting the protocol.
The disconnection of the protocol mentioned aboved, regardless of its nature, leads to a free spot in the set, always. This triggers the queue to pop one of the entrypoints, which will then be used to connect to a peer.
This is the best opportunity to explain a question that the reader may have: why does the slot have two data structures to store the same entrypoints?
In the process of popping an entrypoint from the queue and establishing a connection to its peer - a dequeued connection - a new and different connection may arrive while the dequeued connection is still coming. If this occurs in a scenario of almost full slot (only one vacant spot in the slot), this new connection will take the place "reserved" for the dequeued connection. When the slot is full, we simply discard the connection, so this leads to the dequeued entrypoint being lost and never analyzed.
To prevent this, when the connection arrives and the slot is nearly full, a check is made to see if the length of the entrypoint set is the same as the entrypoint queue. If it is, then no dequeued connection is being processed. If they don't match, then at least one spot must be reserved for a dequeued connection that is coming.
If too convoluted or unnecessary, I can refactor it. What do you think?
hathor/p2p/manager.py
Outdated
| if protocol in self.connection_slot: | ||
| return False |
There was a problem hiding this comment.
Would this only happen in case of a bug? If yes, it should be an assert.
There was a problem hiding this comment.
Not necessarily a bug, as an assert would revert the function. It'd been thought as a way to avoid reconnection/adding twice the same protocol to the same slot.
If not necessary, I can delete it.
hathor/p2p/manager.py
Outdated
|
|
||
| # The connection must be turned into CHECK_ENTRYPOINTS. | ||
| # Will return to on_peer_connect and slot it into check_entrypoints. | ||
| protocol.connection_type = HathorProtocol.ConnectionType.CHECK_ENTRYPOINTS |
There was a problem hiding this comment.
Why are you changing the connection type?
There was a problem hiding this comment.
In previous meetings, we talked about when would we fire a "check entrypoint" type of connection to check other entrypoints.
The main criteria we had established at the time was: if the outgoing slot is full, we need to check why this is happening. Hence, if the slot is full, we establish a check-entrypoint type of connection.
Therefore, in the outgoing case, if it becomes full, we turn it into a check-entrypoints kind of connection.
A useful resource for understanding the ideas at the time of the connection-pool-slots is:
It is not revised yet and outdated, but the check-entrypoints portion is up to date, and there are animations to illustrate better the inner workings.
|
|
||
| @override | ||
| async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None: | ||
| async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None: |
There was a problem hiding this comment.
Callable flags optional arguments even if their typing is stated. For example, we cannot write "[PeerEndpoint, Bool]" in the argument types to include the parameter "discovery_call = True". We can either tell the linter to ignore the typing problem or change it to allow generic arguments in the typing (...).
I'll talk to Glevco to see what's the best approach.
|
|
||
| @override | ||
| async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None: | ||
| async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None: |
| async def discover_and_connect(self, connect_to_endpoint: Callable[[PeerEndpoint], None]) -> None: | ||
| async def discover_and_connect(self, connect_to_endpoint: Callable[..., None]) -> None: |
| max_outgoing: int = settings.P2P_PEER_MAX_OUTGOING_CONNECTIONS | ||
| max_incoming: int = settings.P2P_PEER_MAX_INCOMING_CONNECTIONS | ||
| max_discovered: int = settings.P2P_PEER_MAX_DISCOVERED_PEERS_CONNECTIONS | ||
| max_check_ep: int = settings.P2P_PEER_MAX_CHECK_PEER_CONNECTIONS |
There was a problem hiding this comment.
Get from init(). The builder should inject these values from settings (or elsewhere).
| connection_status: ConnectionResult | ||
|
|
||
| if protocol in self.connection_slot: | ||
| return ConnectionRejected("Protocol already in Slot.") |
There was a problem hiding this comment.
Maybe the error message should be simply "already connected".
| connection_status = ConnectionAllowed(f"Type {self.type} added, slot length: {len(self.connection_slot)}") | ||
| return connection_status |
| return ConnectionChanged("Outgoing -> Check Entrypoints") | ||
|
|
||
| # Check_EP is disconnected too, as we only queue endpoints of ready/valid peers. | ||
| protocol.disconnect(reason="Connection Slot if full. Try again later.") |
There was a problem hiding this comment.
You should call disconnect from the p2p manager. So this method can be easily tested.
| while dequeued_ep: | ||
| for peer in self.verified_peer_storage.values(): | ||
| if dequeued_ep in peer.info.entrypoints: | ||
| dequeued_ep = self.check_entrypoints_slot.remove_connection(protocol, True, dequeued_ep) | ||
| break | ||
| if dequeued_ep and dequeued_ep not in peer.info.entrypoints: | ||
| self.connect_to_endpoint(entrypoint=dequeued_ep.with_id(None)) |
There was a problem hiding this comment.
I'm really cautious with while loops. Why is it needed?
| PEER_ID = PeerIdState | ||
| READY = ReadyState | ||
|
|
||
| class ConnectionType(Enum): |
There was a problem hiding this comment.
I see no value in keeping them "inside HathorProtocol". I guess it just make imports more complicated because of cycles. I'd say a hathor/p2p/types.py is the best place for it to live. The same would apply for the other classes in this file, but that's legacy code.
hathor/conf/settings.py
Outdated
| PEER_MAX_CHECK_PEER_CONNECTIONS: int = 5 | ||
|
|
||
| # Queue size for each connection slot: | ||
| QUEUE_SIZE: int = 100 |
There was a problem hiding this comment.
These are (very) different settings. PEER_MAX_ENTRYPOINTS allows each peer id to have up to 30 entrypoints; while P2P_PEER_MAX_INCOMING_CONNECTIONS allows a maximum of 40 connections from different peers.
hathor/p2p/manager.py
Outdated
| if protocol.connection_type == HathorProtocol.ConnectionType.OUTGOING: | ||
| # Here, it can happend that the protocol changes to Check Entrypoints. | ||
| protocol_connected = self.outgoing_slot.add_connection(protocol) | ||
| # The check is done so discovered connections are not added doubly. | ||
|
|
||
| if protocol.connection_type == HathorProtocol.ConnectionType.INCOMING: | ||
| protocol_connected = self.incoming_slot.add_connection(protocol) | ||
|
|
||
| if protocol.connection_type == HathorProtocol.ConnectionType.DISCOVERED: | ||
| protocol_connected = self.discovered_slot.add_connection(protocol) | ||
|
|
||
| if protocol.connection_type == HathorProtocol.ConnectionType.CHECK_ENTRYPOINTS: | ||
| protocol_connected = self.check_entrypoints_slot.add_connection(protocol) |
There was a problem hiding this comment.
hathor-core already works only for python 3.11+. No worries!
Subdivide the p2p connection pool into four slot types: - OUTGOING: connections our node initiates - INCOMING: connections other nodes initiate to us - DISCOVERED: connections from peer discovery (DNS/bootstrap) - CHECK_ENTRYPOINTS: temporary connections to verify peer entrypoints Add Slot class with connection management, queue support, and limits. Add HathorDiscoveredFactory for discovered connections. Add ConnectionType and ConnectionState enums to HathorProtocol. Add per-slot settings (max connections, queue size). Add tests for slot updates, limits, check_ep, and overflow.
- Replace fragile `not protocol.connection_type` with explicit enum check in get_connection_to_drop, including all outbound types (OUTGOING, DISCOVERED, CHECK_ENTRYPOINTS) - Separate PEER_MAX_INCOMING_CONNECTIONS (slot limit) from PEER_MAX_ENTRYPOINTS (peer broadcast limit) to avoid repurposing - Restore original PEER_MAX_ENTRYPOINTS (30) for entrypoint validation - Use keyword args for HathorSettings in tests for resilience Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bd6e54a to
f4058b7
Compare
…n_peer_connect and changing connection type mechanism for check_ep
Motivation
The current status of Hathor's P2P network doesn't distinguish the different types of connections established between two peers, such that the base code only limits the number of connections established with a given full-node, regardless of type. A malicious attacker could jeopardize the system by creating multiple connections with a given node, jamming its capability to respond and/or create new connections.
This PR implements Connection Pool Slots, subdividing the existing pool into four main groups:
Acceptance Criteria
Checklist
If this review is accepted, the code seems to be production-ready.