diff --git a/specs/agents/README.md b/specs/agents/README.md index ba628a76..23a397af 100644 --- a/specs/agents/README.md +++ b/specs/agents/README.md @@ -40,6 +40,11 @@ You can find details about each of these in the [APM Data Model](https://www.ela - [Transactions](tracing-transactions.md) - [Spans](tracing-spans.md) - [Span destination](tracing-spans-destination.md) + - [Handling huge traces](handling-huge-traces/tracing-spans-handling-huge-traces.md) + - [Hard limit on number of spans to collect](handling-huge-traces/tracing-spans-limit.md) + - [Collecting statistics about dropped spans](handling-huge-traces/tracing-spans-dropped-stats.md) + - [Dropping fast exit spans](handling-huge-traces/tracing-spans-drop-fast-exit.md) + - [Compressing spans](handling-huge-traces/tracing-spans-compress.md) - [Sampling](tracing-sampling.md) - [Distributed tracing](tracing-distributed-tracing.md) - [Tracer API](tracing-api.md) diff --git a/specs/agents/handling-huge-traces/README.md b/specs/agents/handling-huge-traces/README.md new file mode 100644 index 00000000..297e18bc --- /dev/null +++ b/specs/agents/handling-huge-traces/README.md @@ -0,0 +1,41 @@ +# Handling huge traces + +Instrumenting applications that make lots of requests (such as 10k+) to backends like caches or databases can lead to several issues: +- A significant performance impact in the target application. + For example due to high allocation rate, network traffic, garbage collection, additional CPU cycles for serializing, compressing and sending spans, etc. +- Dropping of events in agents or APM Server due to exhausted queues. +- High load on the APM Server. +- High storage costs. +- Decreased performance of the Elastic APM UI due to slow searches and rendering of huge traces. +- Loss of clarity and overview (--> decreased user experience) in the UI when analyzing the traces. + +Agents can implement several strategies to mitigate these issues. +These strategies are designed to capture significant information about relevant spans while at the same time limiting the trace to a manageable size. +Applying any of these strategies inevitably leads to a loss of information. +However, they aim to provide a better tradeoff between cost and insight by not capturing or summarizing less relevant data. + +- [Hard limit on number of spans to collect](tracing-spans-limit.md) \ + Even after applying the most advanced strategies, there must always be a hard limit on the number of spans we collect. + This is the last line of defense that comes with the highest amount of data loss. +- [Collecting statistics about dropped spans](tracing-spans-dropped-stats.md) \ + Makes sure even if dropping spans, we at least have stats about them. +- [Dropping fast exit spans](tracing-spans-drop-fast-exit.md) \ + If a span was blazingly fast, it's probably not worth the cost to send and store it. +- [Compressing spans](tracing-spans-compress.md) \ + If there are a bunch of very similar spans, we can represent them in a single document - a composite span. + +In a nutshell, this is how the different settings work in combination: + +```java +if (span.transaction.spanCount > transaction_max_spans) { + // drop span + // collect statistics for dropped spans +} else if (compression possible) { + // apply compression +} else if (span.duration < exit_span_min_duration) { + // drop span + // collect statistics for dropped spans +} else { + // report span +} +``` diff --git a/specs/agents/handling-huge-traces/tracing-spans-compress.md b/specs/agents/handling-huge-traces/tracing-spans-compress.md new file mode 100644 index 00000000..782edaab --- /dev/null +++ b/specs/agents/handling-huge-traces/tracing-spans-compress.md @@ -0,0 +1,273 @@ +# Compressing spans + +To mitigate the potential flood of spans to a backend, +agents SHOULD implement the strategies laid out in this section to avoid sending almost identical and very similar spans. + +While compressing multiple similar spans into a single composite span can't fully eliminate the collection overhead, +it can significantly reduce the impact on the following areas, +with very little loss of information: +- Agent reporter queue utilization +- Capturing stack traces, serialization, compression, and sending events to APM Server +- Potential to re-use span objects, significantly reducing allocations +- Downstream effects like reducing impact on APM Server, ES storage, and UI performance + +### Configuration option `span_compression_enabled` + +Setting this option to true will enable span compression feature. +Span compression reduces the collection, processing, and storage overhead, and removes clutter from the UI. +The tradeoff is that some information such as DB statements of all the compressed spans will not be collected. + +| | | +|----------------|----------| +| Type | `boolean`| +| Default | `false` | +| Dynamic | `true` | + + +## Consecutive-Exact-Match compression strategy + +One of the biggest sources of excessive data collection are n+1 type queries and repetitive requests to a cache server. +This strategy detects consecutive spans that hold the same information (except for the duration) +and creates a single [composite span](#composite-span). + +``` +[ ] +GET /users + [] [] [] [] [] [] [] [] [] [] + 10x SELECT FROM users +``` + +Two spans are considered to be an exact match if they are of the [same kind](#consecutive-same-kind-compression-strategy) and if their span names are equal: +- `type` +- `subtype` +- `destination.service.resource` +- `name` + +### Configuration option `span_compression_exact_match_max_duration` + +Consecutive spans that are exact match and that are under this threshold will be compressed into a single composite span. +This option does not apply to [composite spans](#composite-span). +This reduces the collection, processing, and storage overhead, and removes clutter from the UI. +The tradeoff is that the DB statements of all the compressed spans will not be collected. + +| | | +|----------------|----------| +| Type | `duration`| +| Default | `5ms` | +| Dynamic | `true` | + +## Consecutive-Same-Kind compression strategy + +Another pattern that often occurs is a high amount of alternating queries to the same backend. +Especially if the individual spans are quite fast, recording every single query is likely to not be worth the overhead. + +``` +[ ] +GET /users + [] [] [] [] [] [] [] [] [] [] + 10x Calls to mysql +``` + +Two spans are considered to be of the same type if the following properties are equal: +- `type` +- `subtype` +- `destination.service.resource` + +```java +boolean isSameKind(Span other) { + return type == other.type + && subtype == other.subtype + && destination.service.resource == other.destination.service.resource +} +``` + +When applying this compression strategy, the `span.name` is set to `Calls to $span.destination.service.resource`. +The rest of the context, such as the `db.statement` will be determined by the first compressed span, which is turned into a composite span. + +### Configuration option `span_compression_same_kind_max_duration` + +Consecutive spans to the same destination that are under this threshold will be compressed into a single composite span. +This option does not apply to [composite spans](#composite-span). +This reduces the collection, processing, and storage overhead, and removes clutter from the UI. +The tradeoff is that the DB statements of all the compressed spans will not be collected. + +| | | +|----------------|----------| +| Type | `duration`| +| Default | `5ms` | +| Dynamic | `true` | + +## Composite span + +Compressed spans don't have a physical span document. +Instead, multiple compressed spans are represented by a composite span. + +### Data model + +The `timestamp` and `duration` have slightly similar semantics, +and they define properties under the `composite` context. + +- `timestamp`: The start timestamp of the first span. +- `duration`: gross duration (i.e., __ - __). +- `composite` + - `count`: The number of compressed spans this composite span represents. + The minimum count is 2 as a composite span represents at least two spans. + - `sum.us`: sum of durations of all compressed spans this composite span represents in microseconds. + Thus `sum.us` is the net duration of all the compressed spans while `duration` is the gross duration (including "whitespace" between the spans). + - `compression_strategy`: A string value indicating which compression strategy was used. The valid values are: + - `exact_match` - [Consecutive-Exact-Match compression strategy](tracing-spans-compress.md#consecutive-exact-match-compression-strategy) + - `same_kind` - [Consecutive-Same-Kind compression strategy](tracing-spans-compress.md#consecutive-same-kind-compression-strategy) + +### Effects on metric processing + +As laid out in the [span destination spec](tracing-spans-destination.md#contextdestinationserviceresource), +APM Server tracks span destination metrics. +To avoid compressed spans to skew latency metrics and cause throughput metrics to be under-counted, +APM Server will take `composite.count` into account when tracking span destination metrics. + +## Compression algorithm + +### Eligibility for compression + +A span is eligible for compression if all the following conditions are met +1. It's an [exit span](tracing-spans.md#exit-spans) +2. The trace context of this span has not been propagated to a downstream service +3. If the span has `outcome` (i.e., `outcome` is present and it's not `null`) then it should be `success`. + It means spans with outcome indicating an issue of potential interest should not be compressed. + +The second condition is important so that we don't remove (compress) a span that may be the parent of a downstream service. +This would orphan the sub-graph started by the downstream service and cause it to not appear in the waterfall view. + +```java +boolean isCompressionEligible() { + return exit && !context.hasPropagated && (outcome == null || outcome == "success") +} +``` + +### Span buffering + +Non-compression-eligible spans may be reported immediately after they have ended. +When a compression-eligible span ends, it does not immediately get reported. +Instead, the span is buffered within its parent. +A span/transaction can buffer at most one child span. + +Span buffering allows to "look back" one span when determining whether a given span should be compressed. + +A buffered span gets reported when +1. its parent ends +2. a non-compressible sibling ends + +```java +void onEnd() { + if (buffered != null) { + report(buffered) + } +} + +void onChildEnd(Span child) { + if (!child.isCompressionEligible()) { + if (buffered != null) { + report(buffered) + buffered = null + } + report(child) + return + } + + if (buffered == null) { + buffered = child + return + } + + if (!buffered.tryToCompress(child)) { + report(buffered) + buffered = child + } +} +``` + +### Turning compressed spans into a composite span + +Spans have `tryToCompress` method that is called on a span buffered by its parent. +On the first call the span checks if it can be compressed with the given sibling and it selects the best compression strategy. +Note that the compression strategy selected only once based on the first two spans of the sequence. +The compression strategy cannot be changed by the rest the spans in the sequence. +So when the current sibling span cannot be added to the ongoing sequence under the selected compression strategy +then the ongoing is terminated, it is sent out as a composite span and the current sibling span is buffered. + +If the spans are of the same kind, and have the same name and both spans `duration` <= `span_compression_exact_match_max_duration`, +we apply the [Consecutive-Exact-Match compression strategy](tracing-spans-compress.md#consecutive-exact-match-compression-strategy). +Note that if the spans are _exact match_ +but duration threshold requirement is not satisfied we just stop compression sequence. +In particular it means that the implementation should not proceed to try _same kind_ strategy. +Otherwise user would have to lower both `span_compression_exact_match_max_duration` and `span_compression_same_kind_max_duration` +to prevent longer _exact match_ spans from being compressed. + +If the spans are of the same kind but have different span names and both spans `duration` <= `span_compression_same_kind_max_duration`, +we compress them using the [Consecutive-Same-Kind compression strategy](tracing-spans-compress.md#consecutive-same-kind-compression-strategy). + +```java +bool tryToCompress(Span sibling) { + isAlreadyComposite = composite != null + canBeCompressed = isAlreadyComposite ? tryToCompressComposite(sibling) : tryToCompressRegular(sibling) + if (!canBeCompressed) { + return false + } + + if (!isAlreadyComposite) { + composite.count = 1 + composite.sumUs = duration + } + + ++composite.count + composite.sumUs += other.duration + return true +} + +bool tryToCompressRegular(Span sibling) { + if (!isSameKind(sibling)) { + return false + } + + if (name == sibling.name) { + if (duration <= span_compression_exact_match_max_duration && sibling.duration <= span_compression_exact_match_max_duration) { + composite.compressionStrategy = "exact_match" + return true + } + return false + } + + if (duration <= span_compression_same_kind_max_duration && sibling.duration <= span_compression_same_kind_max_duration) { + composite.compressionStrategy = "same_kind" + name = "Calls to " + destination.service.resource + return true + } + + return false +} + +bool tryToCompressComposite(Span sibling) { + switch (composite.compressionStrategy) { + case "exact_match": + return isSameKind(sibling) && name == sibling.name && sibling.duration <= span_compression_exact_match_max_duration + + case "same_kind": + return isSameKind(sibling) && sibling.duration <= span_compression_same_kind_max_duration + } +} +``` + +### Concurrency + +The pseudo-code in this spec is intentionally not written in a thread-safe manner to make it more concise. +Also, thread safety is highly platform/runtime dependent, and some don't support parallelism or concurrency. + +However, if there can be a situation where multiple spans may end concurrently, agents MUST guard against race conditions. +To do that, agents should prefer [lock-free algorithms](https://en.wikipedia.org/wiki/Non-blocking_algorithm) +paired with retry loops over blocking algorithms that use mutexes or locks. + +In particular, operations that work with the buffer require special attention: +- Setting a span into the buffer must be handled atomically. +- Retrieving a span from the buffer must be handled atomically. + Retrieving includes atomically getting and clearing the buffer. + This makes sure that only one thread can compare span properties and call mutating methods, such as `compress` at a time. diff --git a/specs/agents/handling-huge-traces/tracing-spans-drop-fast-exit.md b/specs/agents/handling-huge-traces/tracing-spans-drop-fast-exit.md new file mode 100644 index 00000000..c75d7b6f --- /dev/null +++ b/specs/agents/handling-huge-traces/tracing-spans-drop-fast-exit.md @@ -0,0 +1,80 @@ +# Dropping fast exit spans + +If an exit span was really fast, chances are that it's not relevant for analyzing latency issues. +Therefore, agents SHOULD implement the strategy laid out in this section to let users choose the level of detail/cost tradeoff that makes sense for them. +If an agent implements this strategy, it MUST also implement [Collecting statistics about dropped spans](tracing-spans-dropped-stats.md). + +## `exit_span_min_duration` configuration + +Sets the minimum duration of exit spans. +Exit spans that execute faster than this threshold are attempted to be discarded. + +In some cases exit spans cannot be discarded. +For example, spans that propagate the trace context to downstream services, +such as outgoing HTTP requests, +can't be discarded. +However, external calls that don't propagate context, +such as calls to a database, can be discarded using this threshold. + +Additionally, spans that lead to an error can't be discarded. + +| | | +|----------------|------------| +| Type | `duration` | +| Default | `1ms` | +| Central config | `true` | + +TODO: should we introduce µs granularity for this config option? +Adding `us` to all `duration`-typed options would create compatibility issues. +So we probably want to support `us` for this option only. + +## Interplay with span compression + +If an agent implements [span compression](tracing-spans-compress.md), +the limit applies to the [composite span](tracing-spans-compress.md#composite-span). + +For example, if 10 Redis calls are compressed into a single composite span whose total duration is lower than `exit_span_min_duration`, +it will be dropped. +If, on the other hand, the individual Redis calls are below the threshold, +but the sum of their durations is above it, the composite span will not be dropped. + +## Limitations + +The limitations are based on the premise that the `parent_id` of each span and transaction that's stored in Elasticsearch +should point to another valid transaction or span that's present in the Elasticsearch index. + +A span that refers to a missing span via is `parent_id` is also known as an "orphaned span". + +### Spans that propagate context to downstream services can't be discarded + +We only know whether to discard after the call has ended. +At that point, +the trace has already continued on the downstream service. +Discarding the span for the external request would orphan the transaction of the downstream call. + +Propagating the trace context to downstream services is also known as out-of-process context propagation. + +## Implementation + +### `discardable` flag + +Spans store an additional `discardable` flag in order to determine whether a span can be discarded. +The default value is `true` for [exit spans](tracing-spans.md#exit-spans) and `false` for any other span. + +According to the [limitations](#Limitations), +there are certain situations where the `discardable` flag of a span is set to `false`: +- the span's `outcome` field is set to anything other than `success`. + So spans with outcome indicating an issue of potential interest are not discardable +- On out-of-process context propagation + +### Determining whether to report a span + +If the span's duration is less than `exit_span_min_duration` and the span is discardable (`discardable=true`), +the `span_count.dropped` count is incremented, and the span will not be reported. +We're deliberately using the same dropped counter we also use when dropping spans due to [`transaction_max_spans`](tracing-spans-limit.md#configuration-option-transaction_max_spans). +This ensures that a dropped fast span doesn't consume from the max spans limit. + +### Metric collection + +To reduce the data loss, agents [collect statistics about dropped spans](tracing-spans-dropped-stats.md). +Dropped spans contribute to [breakdown metrics](https://docs.google.com/document/d/1-_LuC9zhmva0VvLgtI0KcHuLzNztPHbcM0ZdlcPUl64#heading=h.ondan294nbpt) the same way as non-discarded spans. diff --git a/specs/agents/handling-huge-traces/tracing-spans-dropped-stats.md b/specs/agents/handling-huge-traces/tracing-spans-dropped-stats.md new file mode 100644 index 00000000..1fa2cf80 --- /dev/null +++ b/specs/agents/handling-huge-traces/tracing-spans-dropped-stats.md @@ -0,0 +1,55 @@ +# Collecting statistics about dropped spans + +To still retain some information about dropped spans (for example due to [`transaction_max_spans`](tracing-spans-limit.md) or [`exit_span_min_duration`](tracing-spans-drop-fast-exit.md)), +agents SHOULD collect statistics on the corresponding transaction about dropped spans. +These statistics MUST only be sent for sampled transactions. + +## Use cases + +This allows APM Server to consider these metrics for the service destination metrics. +In practice, +this means that the service map, the dependencies table, +and the backend details view can show accurate throughput statistics for backends like Redis, +even if most of the spans are dropped. + +This also allows the transaction details view (aka. waterfall) to show a summary of the dropped spans. + +## Data model + +This is an example of the statistics that are added to the `transaction` events sent via the intake v2 protocol. + +```json +{ + "dropped_spans_stats": [ + { + "type": "external", + "subtype": "http", + "destination_service_resource": "example.com:443", + "outcome": "failure", + "count": 28, + "duration.sum.us": 123456 + }, + { + "type": "db", + "subtype": "mysql", + "destination_service_resource": "mysql", + "outcome": "success", + "count": 81, + "duration.sum.us": 9876543 + } + ] +} +``` + +## Limits + +To avoid the structures from growing without bounds (which is only expected in pathological cases), +agents MUST limit the size of the `dropped_spans_stats` to 128 entries per transaction. +Any entries that would exceed the limit are silently dropped. + +## Effects on destination service metrics + +As laid out in the [span destination spec](tracing-spans-destination.md#contextdestinationserviceresource), +APM Server tracks span destination metrics. +To avoid dropped spans to skew latency metrics and cause throughput metrics to be under-counted, +APM Server will take `dropped_spans_stats` into account when tracking span destination metrics. diff --git a/specs/agents/handling-huge-traces/tracing-spans-limit.md b/specs/agents/handling-huge-traces/tracing-spans-limit.md new file mode 100644 index 00000000..42bccffb --- /dev/null +++ b/specs/agents/handling-huge-traces/tracing-spans-limit.md @@ -0,0 +1,89 @@ +# Hard limit on number of spans to collect + +This is the last line of defense that comes with the highest amount of data loss. +This strategy MUST be implemented by all agents. +Ideally, the other mechanisms limit the amount of spans enough so that the hard limit does not kick in. + +Agents SHOULD also [collect statistics about dropped spans](tracing-spans-dropped-stats.md) when implementing this spec. + +## Configuration option `transaction_max_spans` + +Limits the amount of spans that are recorded per transaction. + +This is helpful in cases where a transaction creates a very high amount of spans (e.g. thousands of SQL queries). + +Setting an upper limit will prevent overloading the agent and the APM server with too much work for such edge cases. + +| | | +|----------------|----------| +| Type | `integer`| +| Default | `500` | +| Dynamic | `true` | + +## Implementation + +### Span count + +When a span is put in the agent's reporter queue, a counter should be incremented on its transaction, in order to later identify the _expected_ number of spans. +In this way we can identify data loss, e.g. because events have been dropped. + +This counter SHOULD internally be named `reported` and MUST be mapped to `span_count.started` in the intake API. +The word `started` is a misnomer but needs to be used for backward compatibility. +The rest of the spec will refer to this field as `span_count.reported`. + +When a span is dropped, it is not reported to the APM Server, +instead another counter is incremented to track the number of spans dropped. +In this case the above mentioned counter for `reported` spans is not incremented. + +```json +"span_count": { + "started": 500, + "dropped": 42 +} +``` + +The total number of spans that an agent created within a transaction is equal to `span_count.started + span_count.dropped`. + +### Checking the limit + +Before creating a span, +agents must determine whether that span would exceed the span limit. +The limit is reached when the number of reported spans is greater or equal to the max number of spans. +In other words, the limit is reached if this condition is true: + + atomic_get(transaction.span_count.eligible_for_reporting) >= transaction_max_spans + +On span end, agents that support the concurrent creation of spans need to check the condition again. +That is because any number of spans may be started before any of them end. + +```java +if (atomic_get(transaction.span_count.eligible_for_reporting) <= transaction_max_spans // optional optimization + && atomic_get_and_increment(transaction.span_count.eligible_for_reporting) <= transaction_max_spans ) { + should_be_reported = true + atomic_increment(transaction.span_count.reported) +} else { + should_be_reported = false + atomic_increment(transaction.span_count.dropped) + transaction.track_dropped_stats(this) +} +``` + +`eligible_for_reporting` is another counter in the span_count object, but it's not reported to APM Server. +It's similar to `reported` but the value may be higher. + +### Configuration snapshot + +To ensure consistent behavior within one transaction, +the `transaction_max_spans` option should be read once on transaction start. +Even if the option is changed via remote config during the lifetime of a transaction, +the value that has been read at the start of the transaction should be used. + +### Metric collection + +Even though we can determine whether to drop a span before starting it, it's not legal to return a `null` or noop span in that case. +That's because we're [collecting statistics about dropped spans](tracing-spans-dropped-stats.md) as well as +[breakdown metrics](https://docs.google.com/document/d/1-_LuC9zhmva0VvLgtI0KcHuLzNztPHbcM0ZdlcPUl64#heading=h.ondan294nbpt) +even for spans that exceed `transaction_max_spans`. + +For spans that are known to be dropped upfront, Agents SHOULD NOT collect information that is expensive to get and not needed for metrics collection. +This includes capturing headers, request bodies, and summarizing SQL statements, for example. diff --git a/specs/agents/tracing-spans-destination.md b/specs/agents/tracing-spans-destination.md index 6dbe05ed..1336b70a 100644 --- a/specs/agents/tracing-spans-destination.md +++ b/specs/agents/tracing-spans-destination.md @@ -83,21 +83,11 @@ providing a way to manually disable the automatic setting/inference of this fiel from a service map or an external service from the dependencies table). A user-supplied value MUST have the highest precedence, regardless if it was set before or after the automatic setting is invoked. -To allow for automatic inference, -without users having to specify any destination field, -agents SHOULD offer a dedicated API to start an exit span. -This API sets the `exit` flag to `true` and returns `null` or a noop span in case the parent already represents an `exit` span. - **Value** -For all exit spans, unless the `context.destination.service.resource` field was set by the user to `null` or an empty +For all [exit spans](handling-huge-traces/tracing-spans.md#exit-spans), unless the `context.destination.service.resource` field was set by the user to `null` or an empty string through API, agents MUST infer the value of this field based on properties that are set on the span. -This is how to determine whether a span is an exit span: -```groovy -exit = exit || context.destination || context.db || context.message || context.http -``` - If no value is set to the `context.destination.service.resource` field, the logic for automatically inferring it MUST be the following: diff --git a/specs/agents/tracing-spans.md b/specs/agents/tracing-spans.md index 9ed7a189..0b2fb31a 100644 --- a/specs/agents/tracing-spans.md +++ b/specs/agents/tracing-spans.md @@ -56,26 +56,16 @@ The documentation should clarify that spans with `unknown` outcomes are ignored Spans may have an associated stack trace, in order to locate the associated source code that caused the span to occur. If there are many spans being collected this can cause a significant amount of overhead in the application, due to the capture, rendering, and transmission of potentially large stack traces. It is possible to limit the recording of span stack traces to only spans that are slower than a specified duration, using the config variable `ELASTIC_APM_SPAN_FRAMES_MIN_DURATION`. -### Span count - -When a span is started a counter should be incremented on its transaction, in order to later identify the _expected_ number of spans. In this way we can identify data loss, e.g. because events have been dropped, or because of instrumentation errors. - -To handle edge cases where many spans are captured within a single transaction, the agent should enable the user to start dropping spans when the associated transaction exeeds a configurable number of spans. When a span is dropped, it is not reported to the APM Server, but instead another counter is incremented to track the number of spans dropped. In this case the above mentioned counter for started spans is not incremented. - -```json -"span_count": { - "started": 500, - "dropped": 42 -} -``` - -Here's how the limit can be configured for [Node.js](https://www.elastic.co/guide/en/apm/agent/nodejs/current/agent-api.html#transaction-max-spans) and [Python](https://www.elastic.co/guide/en/apm/agent/python/current/configuration.html#config-transaction-max-spans). - ### Exit spans Exit spans are spans that describe a call to an external service, such as an outgoing HTTP request or a call to a database. +A span is considered an exit span if it has explicitly been marked as such or if it has context fields that are indicative of it being an exit span: +```groovy +exit = exit || context.destination || context.db || context.message || context.http +``` + #### Child spans of exit spans Exit spans MUST not have child spans that have a different `type` or `subtype`. @@ -93,7 +83,7 @@ For example, an HTTP exit span may have child spans with the `action` `request`, These spans MUST NOT have any destination context, so that there's no effect on destination metrics. Most agents would want to treat exit spans as leaf spans, though. -This brings the benefit of being able to compress repetitive exit spans (TODO link to span compression spec once available), +This brings the benefit of being able to [compress](handling-huge-traces/tracing-spans-compress.md) repetitive exit spans, as span compression is only applicable to leaf spans. Agents MAY implement mechanisms to prevent the creation of child spans of exit spans. @@ -111,7 +101,7 @@ However, when tracing a regular outgoing HTTP request (one that's not initiated and it's unknown whether the downsteam service continues the trace, the trace headers should be added. -The reason is that spans cannot be compressed (TODO link to span compression spec once available) if the context has been propagated, as it may lead to orphaned transactions. +The reason is that spans cannot be [compressed](handling-huge-traces/tracing-spans-compress.md) if the context has been propagated, as it may lead to orphaned transactions. That means that the `parent.id` of a transaction may refer to a span that's not available because it has been compressed (merged with another span). There can, however, be exceptions to this rule whenever it makes sense. For example, if it's known that the backend system can continue the trace.