-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream request logs for static.crates.io from Fastly to Datadog #406
Conversation
Datadog has a few reserved attributes[^1] that have a special meaning on its platform. The logs that the Fastly service generates now set two of them, namely the source and the service. Both will make it easier to process the logs in Datadog's log pipeline. [^1]: https://docs.datadoghq.com/logs/log_configuration/attributes_naming_convention/#reserved-attributes
The logging implementation for the Compute@Edge service on Fastly has been refactored so that logs can be sent to both S3 for long-term storage and Datadog for real-time analysis.
The environment is now passed to the Terraform module as a variable. This also makes it possible to dynamically derive the SSM parameter for the customer ID, which was previously hard-coded.
Datadog has a concept called unified service tagging[^1] that connects data across different parts of the platform. We have added more tags to each log to make use of this feature. [^1]: https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging
The prior version of the log was very barebones, which was fine for capturing the requested URLs, but insufficient to debug the service. The extended log format includes more information about the client, protocols, and Fastly service.
In an effort to simplify the configuration of both the Terraform and Rust modules, some hard-coded constants have been moved into the Compute@Edge module. This makes it possible to remove the glue code to pass them from the Terraform configuration to the final WASM function. While this does introduce some duplication, it removes a lot of complexity and potential configuration issues.
const DATADOG_APP: &str = "crates.io"; | ||
const DATADOG_SERVICE: &str = "static.crates.io"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, these two static constants were defined in Terragrunt. But they don't change depending on the environment and hardcoding them removes a potential panic if they'd accidentally got deleted.
.referer( | ||
request | ||
.get_header("Referer") | ||
.and_then(|s| s.to_str().ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.ok()
is used here to prevent issues with the logs causing a request to fail.
@Turbo87 Does the log format still comply with the parsing that crates.io does? |
.date_time(OffsetDateTime::now_utc()) | ||
.url(request.get_url_str().into()) | ||
.edge_location(var("FASTLY_POP").ok()) | ||
.host(request.get_url().host().map(|s| s.to_string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for including host
if it's essentially derived from url
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a really quick experiment with the Remapper
processor on Datadog, but that doesn't seem to like using a nested attribute (http.url_details.host
) as the source. So I'd say we just leave it in for now for simplicity.
AFAICT there were no changes to the existing fields other than reordering them, so on first glance this looks fine |
I guess this is a direct stream, right? So this doesn't address #405? |
The same log format is used for the stream and the files that we write to S3. So after this change, those will include the |
Does it make sense to bump the version of the log format to |
that would require code updates and us deploying those updates to crates.io before the log format can be updated. since the changes are additive and non-breaking I don't think we need a new version. |
crates.io recently implement a few changes that make
cargo
download crates directly fromstatic.crates.io
without hitting the API of crates.io first. This increases the performance and scalability of crates.io, but means that requests are no longer logged by the application. We want to restore the previous behavior by aggregating request logs from our CDNs in Datadog.The Fastly service has been extended to include more information in its request logs and to stream them to Datadog. We're also tagging the logs with Datadog's Unified Service Tagging to provide visibility across the app (
crates.io
), service (static.crates.io
), and different environments.