Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: write up ipni+datalog sketch #22

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions ipni-datalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Query IPNI with Datalog

## Authors

- [Irakli Gozalishvili]

## Background

### InterPlanetary Network Indexer (IPNI)

[InterPlanetary Network Indexer (IPNI)][IPNI] is not exactly key value store but perhaps it could be (ab)used as such to unlock an open-ended extensible innovation in user space.

IPNI takes advantage of `1:n` relation that e.g. IPFS DAG root has with blocks it consists of to provide reverse lookups, that is resolve DAG root by any of it's blocks. Exploit here is that many relations can be dropped or synced simultaneously. On the other hand trying to do the reverse would be hard to scale.

### W3 DAG Index

[W3 Index] describes DAG in terms of blocks it consists of and byte ranges of blobs addressed by the [multihash]. DAG index is encoded as a [Content Archive (CAR)][CAR] link and all the block, and [IPNI] advertisement is created such that every blob and block [multihash] lookup would resolve CAR link.

This allows clients to:

1. Perform [IPNI] lookup by DAG root block to get a DAG Index address.
2. Fetch [DAG Index] to get [multihash](es) of blobs containing DAG blocks.
3. Perform [IPNI] lookup to resolve [location commitment]s per blob.
gammazero marked this conversation as resolved.
Show resolved Hide resolved
4. Fetch [blob]s and slice it up per [DAG Index] ranges to derive set DAG blocks.
5. Built DAG from the root node using blocks.

#### W3 DAG Index Example

User can create DAG Index locally and submit it for publishing to [IPNI] through [w3up] via [w3 index] protocol.

```js
{ // CAR CID of this is "bag...idx"
"index/sharded/[email protected]": {
"content": { "/": "bafy..dag" }, // Removing CID prefix leads to "block..1"
"shards": [
[
// blob multihash
{ "/": { "bytes": "blb...left" } },
// sliced within the blob
[
[{ "/": { "bytes": "block..1"} }, 0, 128],
[{ "/": { "bytes": "block..2"} }, 129, 256],
[{ "/": { "bytes": "block..3"} }, 257, 384],
[{ "/": { "bytes": "block..4"} }, 385, 512]
]
],
[
// blob multihash
{ "/": { "bytes": "blb...right" } },
// sliced within the blob
[
[{ "/": { "bytes": "block..5"} }, 0, 128],
[{ "/": { "bytes": "block..6"} }, 129, 256],
[{ "/": { "bytes": "block..7"} }, 257, 384],
[{ "/": { "bytes": "block..8"} }, 385, 512]
]
]
]
}
}
```

[W3 Index] protocol implementation will derive and publish [IPNI] advertisement such that every multihash (key) would resolve to the [DAG Index] link (entity)

> ℹ️ You can lookup `entity` by a `key`

| entity | key |
| ----------- | ------------- |
| `bag...idx` | `blb...left` |
| `bag...idx` | `block..1` |
| `bag...idx` | `block..2` |
| `bag...idx` | `block..3` |
| `bag...idx` | `block..4` |
| `bag...idx` | `blb...right` |
| `bag...idx` | `block..5` |
| `bag...idx` | `block..6` |
| `bag...idx` | `block..7` |
| `bag...idx` | `block..8` |

### Design Tradeoffs

1. Client is required to fetch `bag...idx` and parse it before it can start looking for locations for the blobs.

2. Anyone could publish an IPNI advertisement that will associate `block..1` to some `who..knows` multihash which may not be a [DAG Index], yet client will only discover that after fetching and trying to parse as such.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be controlled if we use our own indexer and only allow known publishers to publish ads. We can even modify it to only allow certain types of index data.

The client can filter out results to remove unknown publishers and unwanted types of metadata. This can be done without having to fully read the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is a good call, client could leverage existing trust where possible. I have been mostly interested in an open ended scenario in which publisher of the advertisment can be different from the author and bares no accountability for the accuracy. But if publisher is accountable for advertisements trust in publisher could be leveraged to address this concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IPNI should be able to support using UNAN to allow the content provider to authorize a publisher to publish on its behalf. I am going to propose that as a change to the IPNI spec. What is the appropriate type of UCAN to do this?


3. Client can not query specific relations, meaning fact that `block..1` is block of the DAG and fact that `blb...left` is a blob containing DAG blocks is not captured in any way.
gammazero marked this conversation as resolved.
Show resolved Hide resolved

4. We also have not captured relation between `blob..left` and `block..1` in any way other than they both relate to `bag...idx`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What relationships should be captured and how would they be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I go into more details below but put it simply if you look at the example index you can see that:

  1. block..1 is slice of blb...left blob.
  2. block..1 is block of bafy..dag dag.

When I lookup block..1 which in datalog could be something like ["?e", "?relation", "block...1"] I would expect to get back something like this:

[
  ["blb..dag", "[email protected]/block", "block..1"],
  ["blb..left", "[email protected]/slice", "block...1"],
]

But I also would like to be able to query specific leration like ["?dag", "[email protected]/block", "block..1"] to get something like:

[
  ["blb..left", "[email protected]/slice", "block...1"],
]


## Proposal

Leverage [IPNI] such that we could resolve `entity` that [multihash] relates to
without having to resolve and fetch [DAG Index] first.

General idea is to capture relation names in a way that it could be utilized during lookups. This also would make indexing protocol open-ended and extensible by anyone without having intermediaries like W3Up having to adopt those extensions.

To illustrate this lets derive another lookup table from the same [DAG Index][index example], however in this case we will replace `key` column with pair of `attribute` (describing relation to an `entity`) and `value` which is a [multihash] related to `entity`.

| entity | attribute | value |
| ----------- | --------------------- | ------------|
| `blb..left` | `[email protected]/shard` | `bafy..dag` |
| `blb..left` | `[email protected]/shard/slice` | `block..1` |
| `blb..left` | `[email protected]/shard/slice` | `block..2` |
| `blb..left` | `[email protected]/shard/slice` | `block..3` |
| `blb..left` | `[email protected]/shard/slice` | `block..4` |
| `blb..right`| `[email protected]/shard` | `bafy..dag` |
| `blb..right`| `[email protected]/shard/slice` | `block..5` |
| `blb..right`| `[email protected]/shard/slice` | `block..6` |
| `blb..right`| `[email protected]/shard/slice` | `block..7` |
| `blb..right`| `[email protected]/shard/slice` | `block..8` |

In this variant we have virtual `key` that is derived from `attribute` and `value` components in a deterministic manner e.g. by concatenating their bytes and computing a [multihash].

This means that we could find shards of he `bafy..dag` via IPNI query that resolves `toKey(['bafy..dag', '[email protected]/shard'])` key.

If system drops `blob..left` will drop all the records associated with that entity as it would be a single IPNI advertisement.

Because we get shards from single query we do not need to fetch [DAG Index] to start resolving location commitments. We could start resolving them right away and we gate IPNI with something like GraphQL we could even resolve those in a single roundtrip and cache for subsequent queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we get shards from single query we do not need to fetch [DAG Index] to start resolving location commitments.

A single query to IPNI or to a result cache?

I am not sure I understand. Let's be specific about what key is used to look up what data. What I thought was that If IPNI is queried by a block multihash, the location of the DAG index is returned. The DAG index can be read to get location commitments. This can be cached in a separate result cache. A subsequent query for that same block multihash can first query the cache and then get the index and location commitments back in a single response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Location commitments can be also indexed and advertised on [IPNI] in a very similar way.

| entity | attribute | value |
| ---------|-----------------------|--------------|
| `loc..1` | `assert/[email protected]` | `bafy..left` |
| `loc..2` | `assert/[email protected]` | `bafy..left` |
| `loc..3` | `assert/[email protected]` | `bafy..right`|
gammazero marked this conversation as resolved.
Show resolved Hide resolved

[Irakli Gozalishvili]:https://github.com/gozala
[IPNI]:https://github.com/ipni/specs/blob/main/IPNI.md
[W3 Index]:https://github.com/web3-storage/specs/blob/feat/w3-index/w3-index.md
[CAR]:https://ipld.io/specs/transport/car/
[multihash]:https://github.com/multiformats/multihash
[DAG Index]:#w3-dag-index
[location commitment]:https://github.com/web3-storage/content-claims#location-claim
[datomic]:https://datomic.com
[index example]:#w3-dag-index-example