Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Self-Hosted Web3.Storage #1

Closed
hannahhoward opened this issue Mar 12, 2024 · 2 comments
Closed

Spike: Self-Hosted Web3.Storage #1

hannahhoward opened this issue Mar 12, 2024 · 2 comments
Assignees

Comments

@hannahhoward
Copy link
Member

Goals

Time box to one week -- attempt to produce an upload node that can persist data and a retrieval node that can talk to each other on a local system

For this round, don't worry about web assembly -- feel free to run docker or whatever you like.

  1. Try to make an upload api node that can be run locally (accept store/* & upload/*)
    1. Should persist CAR files to any kind of store
    2. Should write minimum content claims required to lookup content
  2. Try to make a freeway node that can be run locally
    1. Try to connect and read from local node

The goal here is to identify hidden dependencies on cloud services and/or assumptions that make portability difficult.

If this is trivially easy, that's wonderful news. The next move will be to seperate the upload node from the storage device -- and start building code to register storage nodes with the uploader and have it round robin content uploads between them.

@reidlw
Copy link
Contributor

reidlw commented Mar 27, 2024

Alan is going to add some additional notes, but this is effectively done.

@alanshaw
Copy link
Member

alanshaw commented Apr 8, 2024

DEMO: https://youtu.be/eJVA97t-jaw?si=CL9BQCaRKZ15UHVb&t=2748

local.storage

  • https://github.com/w3s-project/local.storage
  • The implementation IS sufficiently de-coupled from centralized services.
    • I was able to pick the parts of the web3.storage stack that I wanted to implement.
    • Known limitations/issues/workarounds:
  • Not surprisingly, we have dependencies on resources that are in the process of being phased out i.e. carpark and dudewhere bucket. Carpark will soon be content claims backed HTTP read from anywhere. Dudewhere is essentially an inclusion claim.
  • Content claims
  • Blob read interface
    • You can read data by base58btc encoded multihash at /blob/:multihash
    • You can get hold of CARs and CARv2 indexes
    • It supports HTTP range requests
    • Could probably serve content claims instead of the content claims read API if we wanted (since they are also hash addressable)
  • Data storage
    • Instead of DynamoDB and S3 buckets local.storage stores all data in a DAG (including uploaded blobs)
    • Data is persisted on disk in an IPFS compatible FS blockstore
    • Data is managed by Pail - a library that implements key/value style storage similar to LevelDB
    • For the most part web3.storage requires simple get/put, but sometimes needs to list data by prefix/range. This is the perfect fit for pail, which is optimized for this.
    • I chose to use a single "pail", so that the entire state of the system can be captured by a single CID at any given time.
    • I created a simple software partitioning system which allows each "store" to operate as if it was it's own pail.
    • In order to ensure consistency, I implemented a simple transaction system which ensures only a single transaction is running concurrently at any given time.

local.freeway

  • https://github.com/w3s-project/local.freeway
  • This was extremely simple to setup, we use our gateway-lib to do the heavy lifting here.
  • The only new/interesting bit is the content claims index, which is a content claims backed index that maps a CID to a URL and byte offset. Obviously for our implementation the URL is always to local.storage but in theory could be to any node serving the CAR/blob.
  • For a given CID, resolution looks like:
    1. Call /claims/:cid with walk parameters parts and includes
    2. In the response, we expect to receive a partition claim and...
      • For each part (shard) we expect a location claim, and an inclusion claim
      • For each include (index) in the inclusion claims, we expect a location claim
    3. We can then read the includes (indexes), by location URL
    4. We can then read individual blocks at byte offsets, by shard location URL
  • Note: this resolution method does not allow random access!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants