-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/telemetry: revamp #2996
Merged
Merged
Changes from all commits
Commits
Show all changes
73 commits
Select commit
Hold shift + click to select a range
7d9896d
setup internal/newtelemetry
eliottness f398276
setup newtelemetry package
eliottness 34d2cf8
internal/newtelemetry: create all structs from schemas
eliottness f921f60
internal/newtelemetry/internal: created and implemented the Writer
eliottness 160e1fb
end to end support for integration, product, client configuration cha…
eliottness f43bbdb
refactor config and implement NewClient()
eliottness a0f0d6b
wip
eliottness b8de6af
support for payload transformers
eliottness a9fc7bd
support for more configuration options
eliottness 5159533
fix hostname resolution logic to match the previous telemetry client
eliottness 5e90cb8
test the writer
eliottness 9d9c3b8
support for disabling telemetry
eliottness 72359f5
renamed transformers to mappers and added a heartbeat mapper, removed…
eliottness 9eafa97
start testing the basic features of the client
eliottness 1baad87
add more tests
eliottness f7e7ff7
add end 2 end tests and body unmarshaling for testing purposes
eliottness 2b2e13b
add actionQueue for calls because NewClient() is done
eliottness 70adebc
fix app-extended-heartbeat payload way of being sent each 24 hours
eliottness f52047b
support for app-dependencies-loaded payload
eliottness 5f5b7ba
setup logging and the recorder for metrics. Change the api to remove …
eliottness 712971b
tested logs API and added benchmarks for it
eliottness 214c2a7
use internal.RecordWriter instead of the testWriter
eliottness f847631
refactor the writer and flush method to record more data that will th…
eliottness 3ebfb63
added replay feature to the metrics function of the global client
eliottness f9dd56c
add internal package for known metrics from the backend
eliottness 4cd8b71
rename metrics package to knownmetrics + added a new TypedSyncMap wra…
eliottness 7487c90
moved the Origin type and Namespace type to the transport package, re…
eliottness 651e8f6
support for distributions metrics
eliottness 44eb960
add tests and fix issues with distributions
eliottness 05b010c
reworked knownmetrics to include namespace and metric kind and added …
eliottness 2cfb429
remove new dependencies that were introduced inadvertently
eliottness 5380ba3
renamed several things added send failure tests and reworked the ring…
eliottness f654610
added mock telemetry client
eliottness 21d0f01
add GlobalClient() function to mirror old package variable GlobalClient
eliottness 29fc678
rename AddAppConfig to RegisterAppConfig and rename AddBulkConfig to …
eliottness c7e4928
support config sanitizing and rename some stuff to facilitate the tra…
eliottness 6ba3e84
switch tags from map[string]string to []string
eliottness 9cf1102
fix misc
eliottness cfe990a
rework telemetrytest to divide the mock client and the recorder client
eliottness 091c522
rework logger to work without sync.Map.Clear() from go 1.23
eliottness 8bc28bc
remove data race from telemetry metrics hot pointer
eliottness 681eae1
flush app-started on StartApp()
eliottness cd86428
add new benchmarks to benchmarking platform
eliottness 373e2fd
apply @RomainMuller suggestions
eliottness 09cab87
fix bug found with the system-tests
eliottness 352d7a5
increase retro compat with all sanitization config value code
eliottness e849e5e
starting to apply @RomainMuller suggestions
eliottness 4798807
rework metrics
eliottness ecf313b
Merge branch 'main' into eliott.bouhana/newtelemetry
eliottness 4fd7f2c
new metric implementation using atomic.Pointer and sync.Pool
eliottness a6a0bdb
removed mapper.wrapper type
eliottness c4080ec
simplify knownmetrics generator
eliottness 0759137
added a new TypedSyncPool for use in the RingQueue to aggregate usage…
eliottness a5b0d0c
apply last suggestions from @RomainMuller
eliottness a62953e
Merge branch 'main' into eliott.bouhana/newtelemetry
eliottness 90c9985
Merge branch 'main' into eliott.bouhana/newtelemetry
eliottness 3dd727b
fix issue where reducing the value of DD_TELEMETRY_HEARTBEAT_INTERVAL…
eliottness ba29163
add debug logs and don't make env and version mandatory parameters
eliottness e27abda
fix lint error that showed up from nothing
eliottness c4440c3
log when heartbeat is at custom interval
eliottness 300a2b9
stop using for range in ticker.C
eliottness 45c8ba4
fix heartbeat flakyness
eliottness 146966b
fix last issues mentioned by @RomainMuller
eliottness 04e09de
Merge branch 'main' into eliott.bouhana/newtelemetry
eliottness 5dc48d5
Create README.md
eliottness cafcff7
add more documentation
eliottness 8f35090
don't put back the buffer into the pool if we did not even use 1/8 of…
eliottness 623efa3
newtelemetry/internal: fix deadlock when (*RingQueue).Enqueue is call…
eliottness 78bdf27
newtelemetry/internal: fix doc comments
eliottness 97424bd
newtelemetry/internal: fix cpu leak in the ticker + stop logging issu…
eliottness 69c6107
newtelemetry/internal: reduce flakyness in case the cpu is slow enoug…
eliottness 13603c0
Merge branch 'main' into eliott.bouhana/newtelemetry
eliottness acb5b49
Merge branch 'main' into eliott.bouhana/newtelemetry
eliottness File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Instrumentation Telemetry Client Architecture | ||
|
||
This documentation details the current architecture of the Instrumentation Telemetry Client of dd-trace-go and was are its capabilities. | ||
For an API documentation, please refer to the [api.go](https://github.com/DataDog/dd-trace-go/blob/main/internal/telemetry/api.go) file content. | ||
|
||
Please, make sure to read the [Specification Documentation](https://github.com/DataDog/instrumentation-telemetry-api-docs/tree/main) before reading this document. | ||
|
||
### Data Flow | ||
|
||
```mermaid | ||
flowchart TD | ||
linkStyle default interpolate basis | ||
globalclient@{ shape: circle } -->|client == nil| recorder | ||
globalclient -->|client != nil| client | ||
recorder@{ shape: cyl } --> client@{ shape: circle } | ||
|
||
subgraph datasources | ||
integrations@{ shape: cyl } | ||
configuration@{ shape: cyl } | ||
dependencies@{ shape: cyl } | ||
products@{ shape: cyl } | ||
logs@{ shape: cyl } | ||
metrics@{ shape: cyl } | ||
end | ||
|
||
client --> datasources | ||
|
||
subgraph mapper | ||
direction LR | ||
app-started --> | ||
default[message-batch<div>heartbeat<div>extended-heartbeat] --> app-closing | ||
end | ||
|
||
flush@{ shape:rounded } | ||
|
||
queue@{ shape: cyl } --> flush | ||
|
||
datasources -..->|at flush| mapper --> flush | ||
flush -->|if writer fails| queue | ||
|
||
flush --> writer | ||
|
||
writer --> agent@{ shape: das } | ||
writer --> backend@{ shape: stadium } | ||
agent --> backend | ||
``` | ||
|
||
### Low Level Components | ||
|
||
- **`RingQueue[T]`**: The ring queue is an arbitrary data structure that support growing buffers, a buffer pool, and overflow. It is used as a backend data structure for the payload queue, the recorder and distribution metrics. | ||
- **`Recorder[T]`**: The recorder is a `RingBuffer[func(T)]` that stores functions until the actual value `T` has been created when calling `Replay(T)` dequeues all functions from the recorder and applies them to the value `T`. By default, it can store 512 functions at most. | ||
- **`Range[T]`**: Simple data structure that stores a start and end value, a minimum and maximum interval and has utils functions to help managing ranges. | ||
- **`SyncMap[K, V]`**: Typed version of `sync.Map` | ||
- **`SyncPool[T]`**: Typed version of `sync.Pool` | ||
|
||
### High Level Components | ||
|
||
- **GlobalClient**: The global client is a singleton that is used to access the client instance. It is used to create a new client instance if it does not exist yet. It is also used to access the client instance if it already exists. The global client recorder record calls to the clients until the `StartApp` function is called | ||
- **Client**: The actual `Client` interface implementation. It's main job is to steer data to its corresponding data source. Other than that it actually manages the config of the client and gather data from the data sources to call `Flush` with it. | ||
- **Data Sources**: Each data source implement the `dataSource` interface that has the method `Payload() transport.Payload` that is supposed to flush all data from the data source and make it into a payload ready to be serialized and sent to the backend. | ||
- **Integrations**: The integrations data source is responsible for creating the [`app-integrations-change`](https://github.com/DataDog/instrumentation-telemetry-api-docs/blob/main/GeneratedDocumentation/ApiDocs/v2/SchemaDocumentation/Schemas/app_integrations_change.md) payload. A very simple slice and mutex is used as backing store. | ||
- **Configuration**: The configuration data source is responsible for creating the [`app-client-configuration-change`](https://github.com/DataDog/instrumentation-telemetry-api-docs/blob/main/GeneratedDocumentation/ApiDocs/v2/SchemaDocumentation/Schemas/app_client_configuration_change.md) payload. A map and mutex is used as backing store. | ||
- **Dependencies**: The dependencies data source is responsible for gathering data [`app-dependencies-loaded`](https://github.com/DataDog/instrumentation-telemetry-api-docs/blob/main/GeneratedDocumentation/ApiDocs/v2/SchemaDocumentation/Schemas/app_dependencies_loaded.md) payload. No public API is available for this as this is does in-house with the `ClientConfig.DependencyLoader` function output. | ||
- **Product**: The product data source is responsible for gathering data [`app-product-change`](https://github.com/DataDog/instrumentation-telemetry-api-docs/blob/main/GeneratedDocumentation/ApiDocs/v2/SchemaDocumentation/Schemas/app_product_change.md) payload. A map and mutex is used as backing store. | ||
- **Metrics**: The metrics data source is responsible for gathering data for the [`generate-metrics`](https://github.com/DataDog/instrumentation-telemetry-api-docs/blob/main/GeneratedDocumentation/ApiDocs/v2/SchemaDocumentation/Schemas/generate_metrics.md) payload. A `SyncMap[metrickey, metricHandle]` is used as backing store. More on that in the metrics specific section | ||
- **Distributions**: The distributions data source is responsible for gathering data for the [`distributions`](https://github.com/DataDog/instrumentation-telemetry-api-docs/blob/main/GeneratedDocumentation/ApiDocs/v2/SchemaDocumentation/Schemas/distributions.md) payload. A `SyncMap[distributionkey, distributionHandle]` is used as backing store. More on that in the metrics specific section | ||
- **Logs**: The logs data source is responsible for gathering data for the [`generate-logs`](https://github.com/DataDog/instrumentation-telemetry-api-docs/blob/main/GeneratedDocumentation/ApiDocs/v2/SchemaDocumentation/Schemas/logs.md) payload. A `SyncMap[logkey, logValue]` is used as backing store. More on that in the logs specific section. | ||
- **Mapper**: The mapper is also responsible for creating the `app-started`, `app-closing`, `heartbeat`, `extended-heartbeat` and `message-batch` payloads from the data sources that needs data from other payloads but not from the API user. The mapper already return another mapper that will be used in the next call to `Flush`. | ||
- **Writer**: The writer is responsible for sending the payload to the backend. It is a simple interface that has a `Write` method that receives a `transport.Payload` and returns statistics about the write operation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
// Unless explicitly stated otherwise all files in this repository are licensed | ||
// under the Apache License Version 2.0. | ||
// This product includes software developed at Datadog (https://www.datadoghq.com/). | ||
// Copyright 2024 Datadog, Inc. | ||
|
||
// Package newtelemetry provides a telemetry client that is thread-safe burden-less telemetry client following the specification of the instrumentation telemetry from Datadog. | ||
// Specification here: https://github.com/DataDog/instrumentation-telemetry-api-docs/tree/main | ||
// | ||
// The telemetry package has 6 main capabilities: | ||
// - Metrics: Support for [Count], [Rate], [Gauge], [Distribution] metrics. | ||
// - Logs: Support Debug, Warn, Error logs with tags and stack traces via the subpackage [log] or the [Log] function. | ||
// - Product: Start, Stop and Startup errors reporting to the backend | ||
// - App Config: Register and change the configuration of the application and declare its origin | ||
// - Integration: Loading and errors | ||
// - Dependencies: Sending all the dependencies of the application to the backend (for SCA purposes for example) | ||
// | ||
// Each of these capabilities is exposed through the [Client] interface but mainly through the package level functions. | ||
// that mirror and call the global client that is started through the [StartApp] function. | ||
// | ||
// Before the [StartApp] function is called, all called to the global client will be recorded and replay | ||
// when the [StartApp] function is called synchronously. The telemetry client is allowed to record at most 512 calls. | ||
// | ||
// At the end of the app lifetime. If [tracer.Stop] is called, the client should be stopped with the [StopApp] function. | ||
// For all data to be flushed to the backend appropriately. | ||
// | ||
// Note: No public API is available for the dependencies payloads as this is does in-house with the `ClientConfig.DependencyLoader` function output. | ||
package newtelemetry | ||
|
||
import ( | ||
"io" | ||
|
||
"gopkg.in/DataDog/dd-trace-go.v1/internal/newtelemetry/internal/transport" | ||
) | ||
|
||
// Namespace describes a product to distinguish telemetry coming from | ||
// different products used by the same application | ||
type Namespace = transport.Namespace | ||
|
||
//goland:noinspection GoVarAndConstTypeMayBeOmitted Goland is having a hard time with the following const block, it keeps deleting the type | ||
const ( | ||
NamespaceGeneral Namespace = transport.NamespaceGeneral | ||
NamespaceTracers Namespace = transport.NamespaceTracers | ||
NamespaceProfilers Namespace = transport.NamespaceProfilers | ||
NamespaceAppSec Namespace = transport.NamespaceAppSec | ||
NamespaceIAST Namespace = transport.NamespaceIAST | ||
NamespaceCIVisibility Namespace = transport.NamespaceCIVisibility | ||
NamespaceMLOps Namespace = transport.NamespaceMLOps | ||
NamespaceRUM Namespace = transport.NamespaceRUM | ||
) | ||
|
||
// Origin describes the source of a configuration change | ||
type Origin = transport.Origin | ||
|
||
//goland:noinspection GoVarAndConstTypeMayBeOmitted Goland is having a hard time with the following const block, it keeps deleting the type | ||
const ( | ||
OriginDefault Origin = transport.OriginDefault | ||
OriginCode Origin = transport.OriginCode | ||
OriginDDConfig Origin = transport.OriginDDConfig | ||
OriginEnvVar Origin = transport.OriginEnvVar | ||
OriginRemoteConfig Origin = transport.OriginRemoteConfig | ||
) | ||
|
||
// LogLevel describes the level of a log message | ||
type LogLevel = transport.LogLevel | ||
|
||
//goland:noinspection GoVarAndConstTypeMayBeOmitted Goland is having a hard time with the following const block, it keeps deleting the type | ||
const ( | ||
LogDebug LogLevel = transport.LogLevelDebug | ||
LogWarn LogLevel = transport.LogLevelWarn | ||
LogError LogLevel = transport.LogLevelError | ||
) | ||
|
||
// MetricHandle can be used to submit different values for the same metric. | ||
// MetricHandle is used to reduce lock contention when submitting metrics. | ||
// This can also be used ephemerally to submit a single metric value like this: | ||
// | ||
// telemetry.metric(telemetry.Appsec, "my-count", map[string]string{"tag1": "true", "tag2": "1.0"}).Submit(1.0) | ||
type MetricHandle interface { | ||
// Submit submits a value to the metric handle. | ||
Submit(value float64) | ||
// Get returns the last value submitted to the metric handle. | ||
Get() float64 | ||
} | ||
|
||
// Integration is an integration that is configured to be traced. | ||
type Integration struct { | ||
// Name is an arbitrary string that must stay constant for the integration. | ||
Name string | ||
// Version is the version of the integration/dependency that is being loaded. | ||
Version string | ||
// Error is the error that occurred while loading the integration. If this field is specified, the integration is | ||
// considered to be having been forcefully disabled because of the error. | ||
Error string | ||
} | ||
|
||
// Configuration is a key-value pair that is used to configure the application. | ||
type Configuration struct { | ||
// Key is the key of the configuration. | ||
Name string | ||
// Value is the value of the configuration. Need to be json serializable. | ||
Value any | ||
// Origin is the source of the configuration change. | ||
Origin Origin | ||
} | ||
|
||
// LogOption is a function that modifies the log message that is sent to the telemetry. | ||
type LogOption func(key *loggerKey, value *loggerValue) | ||
|
||
// Client constitutes all the functions available concurrently for the telemetry users. All methods are thread-safe | ||
// This is an interface for easier testing but all functions will be mirrored at the package level to call | ||
// the global client. | ||
type Client interface { | ||
io.Closer | ||
|
||
// Count obtains the metric handle for the given parameters, or creates a new one if none was created just yet. | ||
// Tags cannot contain commas. | ||
Count(namespace Namespace, name string, tags []string) MetricHandle | ||
|
||
// Rate obtains the metric handle for the given parameters, or creates a new one if none was created just yet. | ||
// Tags cannot contain commas. | ||
Rate(namespace Namespace, name string, tags []string) MetricHandle | ||
|
||
// Gauge obtains the metric handle for the given parameters, or creates a new one if none was created just yet. | ||
// Tags cannot contain commas. | ||
Gauge(namespace Namespace, name string, tags []string) MetricHandle | ||
|
||
// Distribution obtains the metric handle for the given parameters, or creates a new one if none was created just yet. | ||
// Tags cannot contain commas. | ||
Distribution(namespace Namespace, name string, tags []string) MetricHandle | ||
|
||
// Log sends a telemetry log at the desired level with the given text and options. | ||
// Options include sending key-value pairs as tags, and a stack trace frozen from inside the Log function. | ||
Log(level LogLevel, text string, options ...LogOption) | ||
eliottness marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// ProductStarted declares a product to have started at the customer’s request | ||
ProductStarted(product Namespace) | ||
|
||
// ProductStopped declares a product to have being stopped by the customer | ||
ProductStopped(product Namespace) | ||
|
||
// ProductStartError declares that a product could not start because of the following error | ||
ProductStartError(product Namespace, err error) | ||
eliottness marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// RegisterAppConfig adds a key value pair to the app configuration and send the change to telemetry | ||
// value has to be json serializable and the origin is the source of the change. | ||
RegisterAppConfig(key string, value any, origin Origin) | ||
|
||
// RegisterAppConfigs adds a list of key value pairs to the app configuration and sends the change to telemetry. | ||
// Same as AddAppConfig but for multiple values. | ||
eliottness marked this conversation as resolved.
Show resolved
Hide resolved
|
||
RegisterAppConfigs(kvs ...Configuration) | ||
|
||
// MarkIntegrationAsLoaded marks an integration as loaded in the telemetry | ||
MarkIntegrationAsLoaded(integration Integration) | ||
|
||
// Flush closes the client and flushes any remaining data. | ||
Flush() | ||
eliottness marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// AppStart sends the telemetry necessary to signal that the app is starting. | ||
// Preferred use via [StartApp] package level function | ||
AppStart() | ||
|
||
// AppStop sends the telemetry necessary to signal that the app is stopping. | ||
// Preferred use via [StopApp] package level function | ||
AppStop() | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never considered just putting a getter in globalconfig. From APM perspective, typically we process the env var once, in a method like newUnstartedTracer, and set the value on the globalconfig. But your idea is a good one, provided we are not constantly spamming these InstrumentationInstall functions (such that we are not making tons of calls to os.Getenv instead of a single call)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a attempt to create something usable by everyone since I don't believe I have an easy access to those values