-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: store equals for both content and equals multihash #23
Conversation
We want to return equals claims when quired with either the content cid or the equals cid, so store 2 records in dynamo for equals claims. `assert/equals` claims provide 2 CIDs. The `content` CID and the `equals` CID that we claim the `content` CID is equivelant to. (e.g. different hash fn for same bytes) The dynamo records are `{ content: "base58btc-multihash", claim: CID }`. The actual bytes of the claim are stored in s3, only once per claim. For equals claims we store an additional record that sets the `content` to the multihash of the provided `equals` CID. **Alternatively** we could have stored an `equals` property on dynamo records for equals claims, and added a global secondary index for that field, and updated the query to check both the content field and the equals field (and the secondary index) for every query, but this would incur additonal read costs for *every* query. It is assumed that storing a extra tuple of [string,cid] per equals claim will be cheap, and avoids a larger refactor of the claims db. License: MIT Signed-off-by: Oli Evans <[email protected]>
I'm looking at the walk api now to see how that needs to change to deal with bi-directional equals claims now. |
Walk api is not critical to this work. We will be querying by content cid only. There is a more general problem of "the walk api needs to guard against loops" that should be tackled before we go live with that api. Issue raised #24 |
async put ({ claim, bytes, content, expiration }) { | ||
const hasExpiration = expiration && isFinite(expiration) | ||
const cidstr = claim.toString(base32) | ||
async put (claim) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
factored out the s3 put and the db upsert so i can call upsertClaims twice for equals claims.
|
||
export interface Claim { | ||
claim: Link | ||
bytes: Uint8Array | ||
content: MultihashDigest | ||
expiration?: number | ||
expiration?: number, | ||
value: AnyAssertCap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing the full capability object through to the claims store allows us to inspect the capability name, e.g. assert/equals
and special case it, and we also get the nb
property on the capability, so we can extract the equals
cid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also it has a pleasing symmetry with the mutliformats block api which has the bytes
and the decoded value
of a block.
a CIDv0 can be equivalent to other CIDs, so fix the types License: MIT Signed-off-by: Oli Evans <[email protected]>
@@ -69,9 +69,9 @@ export interface RelationPartInclusion { | |||
/** A claim that the same data is referred to by another CID and/or multihash */ | |||
export interface EqualsClaim extends ContentClaim<typeof Assert.equals.can> { | |||
/** Any CID e.g a CAR CID */ | |||
readonly content: Link | |||
readonly content: UnknownLink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type fix... a v0 CID can be equivalent to another CID, so the types should allow UnknownLink here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good to me! Thanks for putting it together
Defer the decision if this is desirable, or if to be desirable client should perform 2 calls instead fo @alanshaw as I am not familiar with the protocol details.
Out of scope, but it would be great to consider for this project to improve error handling in the same lines that we do in w3filecoin / ucanto, where we type and move around the potential errors, instead of relying on errors to be thrown
return dynamoClient.send(cmd) | ||
}, { | ||
minTimeout: 100, | ||
onFailedAttempt: err => console.warn(`failed DynamoDB update for: ${mh}`, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep the error as it was, or perhaps extend it with mh
as well?
Perhaps would make easier to see where the error comes if it logs the cidstr
like the other error
} | ||
// also add the claim with the `equals` cid as the `content` cid, | ||
// so we can provide the equivalent claim for look ups for either. | ||
await upsertClaim({ ...claim, content: equivalent }, this.#table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add any metadata to this? Otherwise, might get complex if we in the future want to change something and not know what was added on the other way. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favour of adding the capability name to the dynamo record, so we could in future we could let user ask for "just assert/equals
or whichever flavour they want, and it has the side benefit of letting us reduce the number of records we'd have to explore to back out this choice, without adding a explicit "this is one of those reverse claims"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
View stack outputs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I added a few non-blocking feedbacks.
Co-authored-by: Alan Shaw <[email protected]>
License: MIT Signed-off-by: Oli Evans <[email protected]>
🤖 I have created a release *beep* *boop* --- ## [3.1.0](content-claims-v3.0.1...content-claims-v3.1.0) (2023-09-19) ### Features * add `assert/equals` ([#22](#22)) ([bddd948](bddd948)) * ensure claim API is CID version agnostic ([#18](#18)) ([1690c1f](1690c1f)) * store equals for both content and equals multihash ([#23](#23)) ([715fcd5](715fcd5)) ### Bug Fixes * key content on multihash not CID ([#21](#21)) ([7e737a7](7e737a7)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
We want to return equals claims when quired with either the content cid or the equals cid, so store 2 records in dynamo for equals claims.
assert/equals
claims provide 2 CIDs. Thecontent
CID and theequals
CID that we claim thecontent
CID is equivelant to. (e.g. different hash fn for same bytes)The dynamo records are
{ content: "base58btc-multihash", claim: CID }
. The actual bytes of the claim are stored in s3, only once per claim.For equals claims we store an additional record that sets the
content
to the multihash of the providedequals
CID.Alternatively we could have stored an
equals
property on dynamo records for equals claims, and added a global secondary index for that field, and updated the query to check both the content field and the equals field (and the secondary index) for every query, but this would incur additonal read costs for every query. It is assumed that storing a extra tuple of [string,cid] per equals claim will be cheap, and avoids a larger refactor of the claims db.License: MIT