HTTP retrieval proposal #747

hsanjuan · 2024-12-09T19:44:09Z

This is a proposal to add HTTP retrieval to Boxo. The current state is highly WIP, but I successfully retrieved something over HTTP, so posting to initiate a discussion over the approach and if we want to pursue it until the end.

Approach

The high-level idea is that most of what lives in bitswap/client is actually an "exchange" implementation, with the only real "Bitswap" thing being that bitswap/network sends HAS/GET requests over bitswap-protocol streams. As such, we should be able to complement bitswap/network with an HTTP-retrieval implementation which, instead of fetching things over the bitswap protocol, calls HTTP endpoints as indicated by the provider's /http addresses entries.

Note that conceptually at least, this is not adding HTTP retrieval into bitswap, but promoting most of the bitswap code to be a reference "Exchange" implementation, which is re-usable for different retrieval protocols (bitswap, http...). That is, we would be talking of an "exchange network" component and not a "bitswap network" component. Renames to this extent are still missing.

Implementation

In order to introduce an http-retrieval "exchange network" we need to:

Know when something should be retrieved via HTTP - that is, an item has an /http provider.
Use HTTP network for that.

To this end:

We have a router which select the http-network or the bitswap-network (or both) based on the existance of /http addresses in the peerstore of the given peer.
We have implemented an http-network as a PoC that performs GET requests to /http endpoints when handling a WANT.

In my tests plugging it to Kubo, the http-network can be used to retrieve content from a gateway over http. 🥳

The main advantange to this approach is that it is relatively clean to incorporate to the codebase, and keeps most of the code untouched, without having to duplicate any of the complex areas.

Challenges

Connectivity tracking is not implemented yet and we will have to see to what extent it can be implemented (I'm guessing we can plug into the TCP dialer directly).
Options like timeouts etc. are not implemented
We use a single HTTP client rather than a pool
Of course testing is fully lacking.

Bitswap places a lot of importance on managing connectivity events to peers. We avoid requesting things from peers that have not signaled connectivity, we clean peers that have disconnected and re-queue things for peers that disconnect. Thus it seems we must support http-connectivity events. When a libp2p peer connects for bitswap, we know that the connection is setup, handshake has been performed and protocol negotiation has happened. For HTTP these things may not exist so we need to define what means "Connected" (i.e. in the case of https it would mean we have completed SSL handshakes).

Apart from that, the question is what are the elements in the current bitswap/client stack that do not apply to HTTP (peerqueues, messagequeues, broadcast, wantsending, prioritization etc.)... and why not? What if a peer disconnects from bitswap but not from http or vice-versa? What if Latency is much worse for bitswap than for http? Perhaps this is all logic for the network-router to know how to choose which network to use to send messages.

Otherwise perhaps it is not possible to have a satisfactory implementation this way and we need to start thinking what to copy-paste into a separate "http-exchange" (at least the client part).

Related: #608

hsanjuan added 4 commits December 9, 2024 20:10

IWIP WIP WIP

54441c6

wip

949b1ba

wip

f530462

wip

d57839b

hsanjuan self-assigned this Dec 9, 2024

hsanjuan requested a review from a team as a code owner December 9, 2024 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP retrieval proposal #747

HTTP retrieval proposal #747

hsanjuan commented Dec 9, 2024 •

edited

Loading

HTTP retrieval proposal #747

Are you sure you want to change the base?

HTTP retrieval proposal #747

Conversation

hsanjuan commented Dec 9, 2024 • edited Loading

Approach

Implementation

Challenges

hsanjuan commented Dec 9, 2024 •

edited

Loading