-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count Downloads Using CDN Logs #372
Comments
jdno
added a commit
to jdno/rust-simpleinfra
that referenced
this issue
Dec 19, 2023
We are planning[^1] to count crate downloads using CDN logs. This requires new infrastructure, namely a SQS queue into which S3 can publish events and that crates.io can monitor. [^1]: rust-lang#372
jdno
added a commit
to jdno/rust-simpleinfra
that referenced
this issue
Jan 17, 2024
We are working on using the logs from our CDNs to count crate downloads on crates.io. Whenever a log archive is uploaded to the bucket, a notification is sent to an SQS queue. crates.io then downloads the log, parses it, and updates the download counts. For this to work, crates.io needs access to the S3 bucket with the logs. This change grants read-only access to individual log archives. See rust-lang#372 for details.
jdno
added a commit
to jdno/rust-simpleinfra
that referenced
this issue
Jan 17, 2024
We are working on using the logs from our CDNs to count crate downloads on crates.io. Whenever a log archive is uploaded to the bucket, a notification is sent to an SQS queue. crates.io then downloads the log, parses it, and updates the download counts. For this to work, crates.io needs access to the S3 bucket with the logs. This change grants read-only access to individual log archives. See rust-lang#372 for details.
jdno
added a commit
to jdno/rust-simpleinfra
that referenced
this issue
Jan 17, 2024
We are working on using the logs from our CDNs to count crate downloads on crates.io. Whenever a log archive is uploaded to the bucket, a notification is sent to an SQS queue. crates.io then downloads the log, parses it, and updates the download counts. For this to work, crates.io needs access to the S3 bucket with the logs. This change grants read-only access to individual log archives. See rust-lang#372 for details.
jdno
added a commit
to jdno/rust-simpleinfra
that referenced
this issue
Jan 18, 2024
The crates-io-prod account was recently created as part of the project to count crate downloads using CDN logs (see rust-lang#372). Similar to all our other AWS accounts, Datadog and Wiz have been installed in the account.
jdno
added a commit
to jdno/rust-simpleinfra
that referenced
this issue
Jan 29, 2024
We are working on using the logs from our CDNs to count crate downloads on crates.io. Whenever a log archive is uploaded to the bucket, a notification is sent to an SQS queue. crates.io then downloads the log, parses it, and updates the download counts. For this to work, crates.io needs access to the S3 bucket with the logs. This change grants read-only access to individual log archives. See rust-lang#372 for details.
jdno
added a commit
to jdno/rust-simpleinfra
that referenced
this issue
Jan 29, 2024
We are working on using the logs from our CDNs to count crate downloads on crates.io. Whenever a log archive is uploaded to the bucket, a notification is sent to an SQS queue. crates.io then downloads the log, parses it, and updates the download counts. For this to work, crates.io needs access to the S3 bucket with the logs. This change grants read-only access to individual log archives. See rust-lang#372 for details.
jdno
added a commit
to jdno/rust-simpleinfra
that referenced
this issue
Jan 31, 2024
The infrastructure to count crate downloads using the CDN logs (see issue rust-lang#372) has been deployed to production.
The infrastructure has been created and is ready for testing. I'll leave the issue open until we've confirmed that crates.io can access and process the logs. |
for cross-linking purposes: more discussion on this on the crates.io side is at https://rust-lang.zulipchat.com/#narrow/stream/318791-t-crates-io/topic/download.20counting.20via.20CDN.20logs |
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem
crates.io counts downloads by crate and version. This is currently done as part of the
/download
endpoint, which counts the download and then redirects the caller to the Content Delivery Networks (CDNs) forstatic.crates.io
, from where the actual file is downloaded.Due to the volume of requests to the
/download
endpoint, counting the crate and its version in the application has a significant performance cost. Especially when traffic spikes, the application can struggle to keep up with requests, which in the worst case can cause a service outage.Goal
Key Objectives
Desired Outcome
In the ideal scenario, we avoid hitting the web app for download requests altogether and go straight to the CDNs. We can achieve this by changing the
dl
field in the index'sconfig.json
to point to the CDN instead of the application. Full compatibility with existing behavior requires to rewrite some URLs, which has already been implemented.The CDNs could attempt to count download, but this is difficult because the CDNs are globally distributed. There is no single point that receives all the traffic, so download counts would need to be processed and merges somewhere else. That system would quickly face the same performance issues that
crates.io
currently faces.We can use the request logs from the CDNs to count downloads in an asynchronous way. The CDNs produce a single log line per request. These logs are collected and uploaded periodically to a dedicated S3 bucket as a compressed archive.
Whenever a new archive is uploaded to the bucket, S3 can push an event into a SQS queue.
crates.io
can monitor the queue and pull incoming events. From the event, it can determine what files to fetch from S3, download and then parse them, and update the download counts in the database.Benefits
Notes
Tasks
Infra-Team
crates.io
(Tracked by the crates.io team)
dl
field to point to the CDNResources
The text was updated successfully, but these errors were encountered: