BUIP006: Blocktorrent – a torrent-style block data transport
Proposer: freetrader
Submitted: 2016-01-07
Status: closed
Revision: 0
Background
On the bitcoin-dev mailing list, Jonathan Toomim proposed the "blocktorrent" concept to speed up new block propagation:
http://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-September/011176.html
A quick survey [1] on this forum showed strong interest for implementing such a mechanism in Bitcoin Unlimited due to potential improvement in block propagation speeds.
Especially when faced with the future scenario of larger block sizes and under certain adverse network conditions existing today (packet/connection loss), the current protocol may be significantly out-performed by a more fine-grained p2p algorithm.
The main problem with the current algorithm is that it requires a peer to have a full block before it can upload it to other peers. Breaking up blocks into smaller chunks and distributing these out of order in a Bittorrent-like fashion can potentially yield a great speed improvement.
Proposal
A torrent-style distribution of new block data (named 'blocktorrent) shall be implemented as an optional feature in addition to the existing p2p block distribution based on 'inv' and 'getdata' (henceforth referred to in this document as the 'standard p2p network protocol' - SPNP).
The proposed blocktorrent protocol shall be referred to in this document as 'blocktorrent p2p network protocol - BPNP.
BPNP shall be designed to augment the existing SPNP-based block propagation in cases where this is deemed beneficial from a propagation performance viewpoint (through configurable parameters describing block size and composition) or enforced by the node operator in lieu of SPNP.
It shall be possible to completely disable BPNP so that the client remains 100% compatible with implementations not supporting BPNP.
A peer shall be able to advertise its BPNP capabilities to other peers so that these can decide their optimal methods of exchanging data with it.
High level Design
As this document is a draft, what follows are provisional high-level design ideas for further discussion and elaboration.
The existing p2p network code shall be altered such that it if BPNP is enabled, a set of BPNP-capable peers is established and used based on settings to be decided.
[Note: @theZerg advised to create a transport plugin layer which would be able to accommodate various kinds of transports, including presumably the current SBNP, future BPNP, perhaps some more like SCTP-based transport etc. Such a plugin layer seems a wise architectural decision and I support it, but would prefer if it was address in a separate BUIP to cleanly distinguish the layering (and separate the implementation/testing). It would probably be good to serialize the BUIPs then, starting with the plugin transport layer one.]
There shall be a module (e.g. blocktorrent.cpp) which implements functionality needed for communicating with the BPNP-capable peers:
-
management of peer states and network connections (TCP/UDP sockets)
-
block assembly/disassembly routines for incoming/outgoing block data
-
interfaces to existing block validation and generation routines
-
a separate cache of mempool transactions with additional data fields
used in the BPNP protocol (e.g. hashes)
-
gathering of metrics on peer connection and block propagation, which
may be used to optimize the protocol's performance (e.g. dynamic chunk sizes) or fallback to SBNP where this would be better (e.g. very small blocks)
Control/data connections
Each logical BPNP peer connection shall consist of two separate network connections:
- a TCP connection for handshake/control information
- a UDP socket for actual block (chunk) data transfer
The control information and UDP message data formats remain to be specified in detail.
[Note: If advisable Google protobuf shall be used to define BPNP message formats and make the protocol more robust for future versioning.]
BPNP-capability advertisement
This remains to be specified in detail.
The current intention is to use version bits to advertise BPNP support.
Simultaneously, until version bits become part of the official reference client,
other forms of advertisement may be needed to bootstrap valid peer discovery.
This could include the user agent information or a query/response protocol after
successfully opening a control connection.Peer sets, discovery and connection/disconnection
[NOTE: the description below contains a number of possible parameters and pools.
This may be vastly constrained an initial implementation to reduce complexity,
and subsequently refined. Please treat it as a set of ideas for now which are
subject to discussion!]The set of BPNP peers is defined as the set of peers (identified by IP addresses) that the client knows or assumes are BPNP-capable. This set may explicitly be larger than the set of regular (SPNP) peers.
Optional: Depending on a user-configurable parameter, this set may be restricted to
a subset of the set of regular peers known to SPNP. Such a parameter would be useful to explicitly limit the BPNP to be maximally the same as the SBNP, for certain comparative performance tests between SBNP/BPNP).Those members of the set of SBNP peers which advertise themselves as BPNP-capable shall form an initial set of eligible BPNP peers.
There shall be a parameter limiting the maximum number of eligible BPNP peers (as obtained through discovery) which are retained in memory.
If the configured maximum number of eligible BPNP peers is larger than the maximum number of SBNP peers, an yet-to-be-specified discovery protocol shall be run to find additional BPNP-capable peers. For now, in the remainder of this document it shall be assumed that such a discovery
protocol is available.Additionally, there shall be a parameter limiting the maximum number of active BPNP peers, i.e. those with established connections and indicating that they are able to actively provide block data via BPNP.
When an active peer control connection is closed (either due to a controlled disconnect of the remote peer or due to a network timeout), the BPNP module shall consider the peer as temporarily disconnected and seek to re-open the connection.
[TODO: decide if a reconnection attempt should be made in case of orderly disconnect of the remote. Perhaps rather not...]
Optional: If re-connection fails after a user-configurable number of retries (which may be 0), a new peer shall be chosen from the set of inactive eligible peers.
If the connection to the new peer can be opened successful within a configurable number of retries, the failed peer is returned to the inactive eligible pool ("swapped out" for the newly activated peer) and its failure metrics are updated.
Optional: If a connection to an replacement peer cannot be established after a configurable number of retries, or if connection metrics for a peer indicate an excessive number or rate of unexpected disconnects, such a peer may be declared unreliable and deprioritised or removed completely from the eligible pool.
Optional: The BPNP protocol shall attempt to 'top up' the eligible pool when peers are removed from it. This could be controlled by some 'low water mark' parameter.
Data Provisioning Algorithm
*Details to be worked out. *
Intention is to take the algorithm outlined by @jtoomim in his mail to bitcoin-dev as a rough starting guideline.
The smallest sensible chunk granularity would presumably be single transactions, but in practice larger chunks would probably be more useful. Probably under parameter control or dynamic in response to performance metrics.
Transaction Cache
Details to be worked out.
Other options under consideration
This is mostly still a wish list based on input gathered so far.
In addition to advertising BPNP-capability in a simple way, peers could advertise preference flags which could be used to:
-
prioritize sending of data to high-bandwidth nodes and to avoid
saturating low-bandwidth peers
-
enable opportunistic transmission (do not wait for a request but
send a defined amount of data straightaway in the assumption that the receiver can handle it)
Forward error correction could be employed in environments where packet loss is known to be a significant problem - i.e. introduce redundancy in to the data so that as long as enough packets get through the block can be reassembled.
--
[1] https://bitco.in/forum/threads/towards-faster-block-propagation-jtoomims-blocktorrent-proposal.742