Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: roundabout with content claims #313

Closed
wants to merge 2 commits into from

Conversation

vasco-santos
Copy link
Contributor

@vasco-santos vasco-santos commented Dec 22, 2023

Makes roundabout use content claims by default. It looks for location claims for requested CAR and attempts to return to a presigned URL for an object in a R2 bucket. In case there are multiple location claims, they are filtered by R2 URLs. If there are no R2 URLs for configured buckets, fallbacks to find S3 URLs configured (e.g. carpark in S3 that also has CARs in R2) and attempts sibling bucket in R2.

We can also in the future easily make these bucket names coming from ENV vars. Given they need to be tied with CF Access key (and other ENVs), we should have a list of buckets we are willing to have roundabout looking for.

Notes:

  • we are currently fallbacking on no location claims found, to attempt carpark-prod-0
  • kept key route with bucket parameter for the time being. This is heavily used by dagcargo created aggregates and we should keep it until we stop dagcargo deals and GC dagcargo R2 bucket

Copy link

seed-deploy bot commented Dec 29, 2023

View stack outputs

export const VALID_BUCKETS = ['dagcargo']
export const VALID_BUCKETS_BY_KEY = ['dagcargo']
export const VALID_R2_BUCKETS_DEFAULT = ['carpark-prod-0', 'carpark-prod-1', 'dagcargo']
export const VALID_S3_BUCKETS_DEFAULT = ['carpark-prod-0']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remind me what the criteria is for a bucket to be valid for redirecting to plz. why wouldn't carpark-prod-1 of the pickup bucket be in this list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is not really a criterium. But the Access keys and secrets set need to work with the buckets there. Before this bucket was tied to carpark-prod-0 by default so the goal is to make it rely on content claims, but it also need to be protected to only go see buckets for web3.storage account. So in theory pickup bucket would be valid, but should we migrate the data instead? I think the end goal should be to just use carpark to make things simpler

Comment on lines +25 to +31
const r2Urls = Array.from(locations)
.filter(
// CF Domain
l => l.includes(CF_R2_DOMAIN) &&
// Bucket name valid for CF
validR2Buckets.filter(b => l.includes(b)).length
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets tighten this up to map the locations to URLs and match on the hostname and path explicitly rather than using string.includes. If the plan is to let anyone post a claim, then we have to be defensive when processing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the account-id in R2. However, it would actually make sense to spec how we plan to send these URLs for R2 as they vary quite a lot in R2 docs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we can actually do this nicely with S3 given bucket name is in the hostname and we can have several. I think we will need to sync on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we could consider using the dedicated domains URLs instead and have a mapper here to know which ones they are in R2 land?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #316

Comment on lines +34 to +63
if (r2Urls.length) {
return r2Urls.map(url => {
// Format https://account-id.r2.cloudflarestorage.com/bucket-name/key
const domainSplit = url.split(CF_R2_DOMAIN)[1]
const bucketName = domainSplit.split('/')[1]
const key = domainSplit.split(`${bucketName}/`)[1]

return {
bucketName,
key
}
})
}

// Attempt S3 URL to pick bucket to try in R2
const s3Urls = Array.from(locations)
.filter(
// S3 Domain
l => l.includes(AWS_S3_DOMAIN) &&
// Bucket name valid for R2 attempt
validS3Buckets.filter(b => l.includes(b)).length
)

// Transform S3 URLs if existent
if (s3Urls.length) {
return s3Urls.map(url => {
// Format 'https://bucket-name.s3.amazonaws.com/key'
const domainParts = url.split(`.${AWS_S3_DOMAIN}`)
const bucketName = domainParts[0].replace('https://', '')
const key = domainParts[1].slice(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, let's use URLs and be explicit.... 3.amazonaws.com would be a valid s3 key name i think. Even if not, a user could put anything in a location claim url

Copy link
Contributor Author

@vasco-santos vasco-santos Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it would work as a key, but not as a CID later on? so for this to work would never be a valid key. Roundabout will always need to be something that uses our buckets structure and not something completely random given the hard requirement for bucket keys

@travis
Copy link
Contributor

travis commented Aug 19, 2024

@hannahhoward @alanshaw closing for now to clean up PR environments but happy to re-open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants