@@ -334,10 +334,99 @@ the distance to retrieve more nodes from adjacent k-buckets on `B`:
334334Node ` A ` now sorts all received nodes by distance to the lookup target and proceeds by
335335repeating the lookup procedure on another, closer node.
336336
337+ ## Hole-punching Asymmetric NATs
338+
339+ This section explains the hole punching mechanism built into the protocol, which is
340+ enabled by the [ RELAYINIT] and [ RELAYMSG] message types. Compared to other protocol
341+ messages, these require deeper interaction with the session layer in order to ensure the
342+ hole punching mechanism operates safely.
343+
344+ In the examples below, we assume that node ` A ` (Alice) has the goal of sending a request
345+ message (e.g. FINDNODE) to node ` B ` (Bob).
346+
347+ Bob operates behind a network-adress-translation (NAT) layer, and is unable to receive UDP
348+ packets from Alice initially. However, Bob has previously communicated with a third node
349+ ` R ` (Relay), and is able to receive incoming packets from the Relay node. We further
350+ assume Bob's NAT is 'asymmetric', i.e. the IP/Port of Bob's packets will be the same
351+ regardless of the host they are sent to. The hole-punching mechanism does not work for
352+ 'symmetric' NAT where every destination host has a unique mapping.
353+
354+ Node Alice may or may not behind a symmetric NAT.
355+
356+ Finally, it is assumed that a common lower bound on lifetime of NAT mappings is 20
357+ seconds, and that mappings will be refreshed when any packet is sent through them. For
358+ more background information about common NAT setups, please consult [ RFC4787] , [ RFC6146]
359+ and [ this paper] [ natpaper ] .
360+
361+ ### Message flow
362+
363+ In the wire protocol, there are four packet types. Since the NAT-related messages require
364+ deeper integration with the packet/session layer, the packet type is explicitly shown for
365+ each message in the diagram below. We use these abbreviations:
366+
367+ - whoareyou - [ WHOAREYOU packet]
368+ - ` m(X) ` : [ message packet] containing request ` X `
369+ - ` H(X) ` : [ handshake message packet] containing request ` X `
370+ - ` s(M) ` : [ session message packet] containing message ` M `
371+
372+ ![ Diagram] ( ./img/nat-hole-punching-flow.svg ) <!-- source: ./img/nat-hole-punching-flow.mermaid -->
373+
374+ Preconditions: Bob is behind NAT. Bob is contained in Relay's node table, they have an
375+ established session and Bob has sent a packet to Relay in the last ~ 20 seconds hence Relay
376+ can get through Bob's NAT.
377+
378+ As part of recursive query for peers, Alice sends a [ FINDNODE] request to Bob, who's ENR
379+ it just received from the Relay. By making an outgoing request to Bob, if Alice is behind
380+ NAT, Alice's NAT adds a mapping `(Alice's-LAN-ip, Alice's-LAN-port, Bob's-WAN-ip,
381+ Bob's-WAN-port, entry-lifetime)`. This means a hole now is punched for Bob in Alice's NAT
382+ for the duration of ` entry-lifetime ` . However, Alice's request is not delivered as Bob is
383+ behind NAT.
384+
385+ Alice detects the timeout, and initiates an attempt to punch a hole in Bob's NAT via
386+ Relay. Alice resets the request time-out on the timed out [ FINDNODE] message and wraps the
387+ message's nonce in a [ RELAYINIT] notification and sends it to Relay. The notification also
388+ contains its ENR and Bob's node ID.
389+
390+ The Relay node validates the [ RELAYINIT] notification and uses the ` target-id ` to look up
391+ Bob's ENR in its node table. Bob is very likely to be a member of the Relay's table
392+ because it was just sent to Alice in a [ NODES] response. Note that, if Bob is not
393+ contained in the table, communication ends here.
394+
395+ The Relay sends a [ RELAYMSG] notification containing Alice's message nonce and ENR to Bob.
396+
397+ Bob disassembles the [ RELAYMSG] and uses the ` nonce ` to assemble a [ WHOAREYOU packet] ,
398+ then sends it to Alice. Bob knows about Alice's endpoint from the ` initiator-enr ` given in
399+ RELAYMSG.
400+
401+ Bob's NAT adds the mapping `(Bob's-LAN-ip, Bob's-LAN-port, Alice's-WAN-ip,
402+ Alice's-WAN-port, entry-lifetime)`. A hole is punched in Bob's NAT for Alice for the
403+ duration of ` entry-lifetime ` .
404+
405+ From here on it's business as usual. See [ Sessions] .
406+
407+ ### Redundancy of ENRs in NODES responses and connectivity status assumptions about Relay and Bob
408+
409+ Often the same peers get passed around in NODES responses by different peers. The chance
410+ of seeing a peer received in a NODES response again in another NODES response is high as
411+ k-buckets favour long lived connections to new ones. This makes the need for a storing
412+ back up relays for peers small.
413+
414+ Apart from the state that is saved by not storing more than the last peer to send us an
415+ ENR as its potential relay, the longer time that has passed since a peer sent us an ENR,
416+ the less guarantee we have that the peer is in fact connected to the owner of that ENR and
417+ hence of its ability to relay.
418+
337419[ EIP-778 ] : ../enr.md
338420[ identity scheme ] : ../enr.md#record-structure
421+ [ message packet ] : ./discv5-wire.md#ordinary-message-packet-flag--0
422+ [ session message packet ] : ./discv5-wire.md#session-message-packet-flag--3
339423[ handshake message packet ] : ./discv5-wire.md#handshake-message-packet-flag--2
340424[ WHOAREYOU packet ] : ./discv5-wire.md#whoareyou-packet-flag--1
341425[ PING ] : ./discv5-wire.md#ping-request-0x01
342426[ PONG ] : ./discv5-wire.md#pong-response-0x02
343427[ FINDNODE ] : ./discv5-wire.md#findnode-request-0x03
428+ [ NODES ] : ./discv5-wire.md#nodes-response-0x04
429+ [ RELAYINIT ] : ./discv5-wire.md#relayinit-notification-0x07
430+ [ RELAYMSG ] : ./discv5-wire.md#relaymsg-notification-0x08
431+
432+ [ Sessions ] : ./discv5-theory.md#sessions
0 commit comments