-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add client semantic conventions for socket connections #756
Changes from all commits
d5cf2f0
266b691
8d9c320
7add745
5914b2a
a54e1ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
change_type: enhancement | ||
|
||
component: connection | ||
|
||
note: Add semantic conventions for client connections | ||
|
||
issues: [454, 756] | ||
|
||
subtext: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
<!--- Hugo front matter used to generate the website version of this page: | ||
linkTitle: Client | ||
---> | ||
|
||
# Connection | ||
|
||
These attributes may be used to describe the socket connection. | ||
|
||
<!-- semconv connection(omit_requirement_level) --> | ||
| Attribute | Type | Description | Examples | | ||
|---|---|---|---| | ||
| `connection.state` | string | State of the connection in the connection pool. | `active` | | ||
|
||
`connection.state` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. | ||
|
||
| Value | Description | | ||
|---|---| | ||
| `active` | Connection is being used. | | ||
| `idle` | Connection idle | | ||
<!-- endsemconv --> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
<!--- Hugo front matter used to generate the website version of this page: | ||
linkTitle: Socket connection | ||
path_base_for_github_subdir: | ||
from: tmp/semconv/docs/connection/_index.md | ||
to: connection/README.md | ||
---> | ||
|
||
# Semantic Conventions for Socket Connections | ||
|
||
**Status**: [Experimental][DocumentStatus] | ||
|
||
This document defines semantic conventions for socket connection. | ||
|
||
Semantic conventions for socket connections are defined for the following signals: | ||
|
||
- [Connection Spans](connection-spans.md): Semantic Conventions for modeling connections as _spans_. | ||
- [Connection Metrics](connection-metrics.md): Semantic Conventions for recording connection metrics. | ||
|
||
[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
<!--- Hugo front matter used to generate the website version of this page: | ||
linkTitle: Connection Metrics | ||
---> | ||
|
||
# Semantic Conventions for Connection Metrics | ||
|
||
This document defines semantic conventions to apply when instrumenting client side of socket connections with metrics. | ||
|
||
**Status**: [Experimental][DocumentStatus] | ||
|
||
<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` --> | ||
|
||
<!-- toc --> | ||
|
||
- [Common attributes](#common-attributes) | ||
- [Metric: `connection.client.connect_duration`](#metric-connectionclientconnect_duration) | ||
- [Metric: `connection.client.duration`](#metric-connectionclientduration) | ||
- [Metric: `connection.client.open_connections`](#metric-connectionclientopen_connections) | ||
|
||
<!-- tocstop --> | ||
|
||
## Common attributes | ||
|
||
All connection metrics share the same set of attributes: | ||
|
||
<!-- semconv metric_attributes.connection.client(full) --> | ||
| Attribute | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `econnreset`; `econnrefused`; `address_family_not_supported`; `java.net.SocketException` | Conditionally Required: [2] | | ||
| [`network.peer.address`](../attributes-registry/network.md) | string | Peer address of the network connection - IP address or Unix domain socket name. [3] | `10.1.2.80`; `/tmp/my.sock` | Recommended: see the note below | | ||
| [`network.peer.port`](../attributes-registry/network.md) | int | Peer port number of the network connection. | `65123` | Recommended: if `network.peer.address` is set. | | ||
| [`network.transport`](../attributes-registry/network.md) | string | [OSI transport layer](https://osi-model.com/transport-layer/) or [inter-process communication method](https://wikipedia.org/wiki/Inter-process_communication). [4] | `tcp`; `udp` | Recommended | | ||
| [`network.type`](../attributes-registry/network.md) | string | [OSI network layer](https://osi-model.com/network-layer/) or non-OSI equivalent. [5] | `ipv4`; `ipv6` | Recommended | | ||
| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [6] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Conditionally Required: if available without reverse DNS lookup | | ||
|
||
**[1]:** It's REQUIRED to document error types instrumentation produces. It's RECOMMENDED to use error codes provided by the socket library, runtime, or the OS (such as `connect` method error codes on [Linux or other POSIX systems](https://man7.org/linux/man-pages/man2/connect.2.html#ERRORS) or [Windows](https://docs.microsoft.com/windows/win32/api/winsock2/nf-winsock2-connect#return-value)). | ||
|
||
**[2]:** If and only if a connection (attempt) ended with an error. | ||
|
||
**[3]:** The `network.peer.address` could be of a high cardinality. In practice, however, its cardinality is limited to the number of distinct IP addresses for the given domain name, which is small when destination service is behind a load balancer or NAT. | ||
Connection instrumentations MAY set `network.peer.address` by default or let users opt into collecting it. If instrumentation collects `network.peer.address` by default, it MUST allow users to opt-out of `network.peer.address` collection or disable collection of all connection metrics that set the attribute. | ||
|
||
**[4]:** The value SHOULD be normalized to lowercase. | ||
|
||
Consider always setting the transport when setting a port number, since | ||
a port number is ambiguous without knowing the transport. For example | ||
different processes could be listening on TCP port 12345 and UDP port 12345. | ||
|
||
**[5]:** The value SHOULD be normalized to lowercase. | ||
|
||
**[6]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. | ||
|
||
`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. | ||
|
||
| Value | Description | | ||
|---|---| | ||
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | | ||
|
||
`network.transport` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. | ||
|
||
| Value | Description | | ||
|---|---| | ||
| `tcp` | TCP | | ||
| `udp` | UDP | | ||
| `pipe` | Named or anonymous pipe. | | ||
| `unix` | Unix domain socket | | ||
|
||
`network.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. | ||
|
||
| Value | Description | | ||
|---|---| | ||
| `ipv4` | IPv4 | | ||
| `ipv6` | IPv6 | | ||
<!-- endsemconv --> | ||
|
||
## Metric: `connection.client.connect_duration` | ||
|
||
This metric is [recommended][MetricRequirementLevel]. | ||
|
||
<!-- semconv metric.connection.client.connect_duration(metric_table) --> | ||
| Name | Instrument Type | Unit (UCUM) | Description | | ||
| -------- | --------------- | ----------- | -------------- | | ||
| `connection.client.connect_duration` | Histogram | `s` | The duration of the attempt to establish connection. | | ||
<!-- endsemconv --> | ||
|
||
## Metric: `connection.client.duration` | ||
|
||
This metric is [recommended][MetricRequirementLevel]. | ||
|
||
<!-- semconv metric.connection.client.duration(metric_table) --> | ||
| Name | Instrument Type | Unit (UCUM) | Description | | ||
| -------- | --------------- | ----------- | -------------- | | ||
| `connection.client.duration` | Histogram | `s` | The duration of the successfully established outbound connection. | | ||
<!-- endsemconv --> | ||
|
||
## Metric: `connection.client.open_connections` | ||
|
||
This metric is [recommended][MetricRequirementLevel]. | ||
|
||
<!-- semconv metric.connection.client.open_connections(metric_table) --> | ||
| Name | Instrument Type | Unit (UCUM) | Description | | ||
| -------- | --------------- | ----------- | -------------- | | ||
| `connection.client.open_connections` | UpDownCounter | `{connection}` | Number of outbound connections that are currently open. | | ||
<!-- endsemconv --> | ||
|
||
[MetricRequirementLevel]: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.26.0/specification/metrics/metric-requirement-level.md | ||
[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
<!--- Hugo front matter used to generate the website version of this page: | ||
linkTitle: Connection Spans | ||
---> | ||
|
||
# Semantic Conventions for Connection Spans | ||
|
||
This document defines semantic conventions to apply when instrumenting client side of socket connections with spans. | ||
|
||
**Status**: [Experimental][DocumentStatus] | ||
|
||
<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` --> | ||
|
||
<!-- toc --> | ||
|
||
* [Span name](#span-name) | ||
* [Attributes](#attributes) | ||
* [Examples](#examples) | ||
* [Successful connection](#successful-connection) | ||
* [Successful connect, but connection terminates with an error](#successful-connect-but-connection-terminates-with-an-error) | ||
* [Attempt to establish connection ends with `econnrefused` error](#attempt-to-establish-connection-ends-with-econnrefused-error) | ||
* [Relationship with application protocols such as HTTP](#relationship-with-application-protocols-such-as-http) | ||
* [Connection retry example](#connection-retry-example) | ||
|
||
<!-- tocstop --> | ||
|
||
this convention defines two types of spans: | ||
|
||
- `connect` span: describes the process of establishing a connection. It corresponds to `connect` function ([Linux or other POSIX systems](https://man7.org/linux/man-pages/man2/connect.2.html) / | ||
[Windows](https://docs.microsoft.com/windows/win32/api/winsock2/nf-winsock2-connect)). | ||
- `connection` span: describes the connection lifetime: it starts right after the connection is successfully established and ends when connection terminates. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if we need both, connect and connection, or if all the data can be represented in the connection. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's important to know how long it takes to establish a connection and important to know if connection was ever established and then terminated. We can potentially have one span for connection and then indicate when the connection has happened with an event, but I'd still argue that we need two separate metrics. |
||
|
||
If `connect` spans ends with an error (connection cannot be established), `connection` span SHOULD NOT be created. | ||
|
||
If connection can be reused in multiple independent operations, instrumentation SHOULD create `connection` span as a root span in a new trace. The `connection` span should link to the `connect` span. This allows to avoid associating long-lived connection span with a trace which coincidentally started it. | ||
|
||
Both spans SHOULD be of a `CLIENT` kind. | ||
|
||
## Span name | ||
|
||
The **span names** SHOULD match `connect` or `connection` depending on the span type. | ||
|
||
## Attributes | ||
|
||
The `connect` and `connection` span share the same list of attributes: | ||
|
||
<!-- semconv span_attributes.connection.client(full) --> | ||
| Attribute | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| [`error.type`](../attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `econnreset`; `econnrefused`; `address_family_not_supported`; `java.net.SocketException` | Conditionally Required: [2] | | ||
| [`network.local.port`](../attributes-registry/network.md) | int | Local port number of the network connection. | `65123` | Recommended | | ||
| [`network.peer.address`](../attributes-registry/network.md) | string | Peer address of the network connection - IP address or Unix domain socket name. | `10.1.2.80`; `/tmp/my.sock` | Required | | ||
| [`network.peer.port`](../attributes-registry/network.md) | int | Peer port number of the network connection. | `65123` | Conditionally Required: when applicable | | ||
| [`network.transport`](../attributes-registry/network.md) | string | [OSI transport layer](https://osi-model.com/transport-layer/) or [inter-process communication method](https://wikipedia.org/wiki/Inter-process_communication). [3] | `tcp`; `udp` | Recommended | | ||
| [`network.type`](../attributes-registry/network.md) | string | [OSI network layer](https://osi-model.com/network-layer/) or non-OSI equivalent. [4] | `ipv4`; `ipv6` | Recommended | | ||
| [`server.address`](../attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [5] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | Conditionally Required: if available without reverse DNS lookup | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this where we would add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think they'd be useful on connection spans/metrics?
For TLS and DNS we'll need a new spans not described in this PR |
||
|
||
**[1]:** It's REQUIRED to document error types instrumentation produces. It's RECOMMENDED to use error codes provided by the socket library, runtime, or the OS (such as `connect` method error codes on [Linux or other POSIX systems](https://man7.org/linux/man-pages/man2/connect.2.html#ERRORS) or [Windows](https://docs.microsoft.com/windows/win32/api/winsock2/nf-winsock2-connect#return-value)). | ||
|
||
**[2]:** If and only if a connection (attempt) ended with an error. | ||
|
||
**[3]:** The value SHOULD be normalized to lowercase. | ||
|
||
Consider always setting the transport when setting a port number, since | ||
a port number is ambiguous without knowing the transport. For example | ||
different processes could be listening on TCP port 12345 and UDP port 12345. | ||
|
||
**[4]:** The value SHOULD be normalized to lowercase. | ||
|
||
**[5]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. | ||
|
||
`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. | ||
|
||
| Value | Description | | ||
|---|---| | ||
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | | ||
|
||
`network.transport` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. | ||
|
||
| Value | Description | | ||
|---|---| | ||
| `tcp` | TCP | | ||
| `udp` | UDP | | ||
| `pipe` | Named or anonymous pipe. | | ||
| `unix` | Unix domain socket | | ||
|
||
`network.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. | ||
|
||
| Value | Description | | ||
|---|---| | ||
| `ipv4` | IPv4 | | ||
| `ipv6` | IPv6 | | ||
<!-- endsemconv --> | ||
|
||
## Examples | ||
|
||
### Successful connection | ||
|
||
Successful connection attempt to `"/tmp/my.sock"` results in the following span: | ||
|
||
| Attribute name | Value | | ||
| :--------------------- | :-------------------| | ||
| name | `"connect"` | | ||
| `network.peer.address` | `"/tmp/my.sock"` | | ||
| `network.transport` | `"unix"` | | ||
|
||
Once corresponding connection is gracefully closed, another span is reported: | ||
|
||
| Attribute name | Value | | ||
| :--------------------- | :-------------------| | ||
| name | `"connection"` | | ||
| `network.peer.address` | `"/tmp/my.sock"` | | ||
| `network.transport` | `"unix"` | | ||
|
||
### Successful connect, but connection terminates with an error | ||
|
||
Successful connection attempt to `example.com` results in the following span: | ||
> Note: DNS lookup is outside of the scope of this semantic convention | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I agree that we shouldn't be trying to include all the dns info in the connection - having an event/marker for timing that indicates when it was complete would be helpful. |
||
|
||
| Attribute name | Value | | ||
| :--------------------- | :-------------------| | ||
| name | `"connect"` | | ||
| `server.address` | `"example.com"` | | ||
| `network.peer.address` | `"93.184.216.34"` | | ||
| `network.peer.port` | `443` | | ||
| `network.transport` | `"tcp"` | | ||
| `network.transport` | `"ipv4"` | | ||
|
||
But then after some packet exchange, the connection is reset: | ||
|
||
| Attribute name | Value | | ||
| :--------------------- | :-------------------| | ||
| name | `"connection"` | | ||
| `server.address` | `"example.com"` | | ||
| `network.peer.address` | `"93.184.216.34"` | | ||
| `network.peer.port` | `443` | | ||
| `network.transport` | `"tcp"` | | ||
| `network.transport` | `"ipv4"` | | ||
| `error.type` | `econnreset` | | ||
|
||
### Attempt to establish connection ends with `econnrefused` error | ||
|
||
An attempt to establish connection to `127.0.0.1:8080` without any application | ||
listening on this port results in the following span: | ||
|
||
| Attribute name | Value | | ||
| :--------------------- | :-------------------| | ||
| name | `"connect"` | | ||
| `network.peer.address` | `"127.0.0.1"` | | ||
| `network.peer.port` | `8080` | | ||
| `network.transport` | `"tcp"` | | ||
| `network.type` | `"ipv4"` | | ||
| `error.type` | `econnrefused` | | ||
|
||
### Relationship with application protocols such as HTTP | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Http becomes interesting with connection pooling, and the ability to either do sequential (http 1.1) or parallel (http2,3) requests over the same connection. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why events? we have links to correlate request to connection and we can put attributes on them if any is necessary. Do you want to capture moment in time when the request is associated with the connection? We don't capture it on links yet, but we can start. Record a link and an event is an overkill. I wonder if DNS and TLS should be spans or events. Since they involve network and have non-zero duration, spans would work better (but will be slightly less performant). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am thinking that this needs to be a dial that ops can turn based on how much data that they want to collect. Using the scenario of an HttpClient call (outgoing):
In most cases you probably don't want to collect all of these all the time. However I can see ops turning them on when needed to collect more specific diagnostics data. So can we make it adaptive, and be able to correlate the data when applicable? I am wondering if an "event" + optional link approach would be best. When a request is put on the wire for a connection - you'd get an event - that way you kind of know what the delay was before your request was processed. If the connections are being tracked, then that "event" would have a link to the connection span, so you could correlate them together. I use "event" in quotes as I am told the future of events on spans is unclear - it could be done with a log message instead. |
||
|
||
It could be impossible to record any relationships between HTTP spans and connection-level spans when connections are pooled and reused. | ||
|
||
The following picture demonstrates an ideal example when recording such relationships (via span links) is possible. | ||
|
||
![connection-spans-and-application-protocols.png](connection-spans-and-application-protocols.png) | ||
|
||
### Connection retry example | ||
|
||
Example of retries when attempting to connect | ||
|
||
``` | ||
HTTP request attempt 1 (trace=t1, span=s1) | ||
| | ||
-- domain name resolution (not covered here) | ||
| | ||
-- connect(127.0.0.1:8080) - timeout (trace=t1, span=s2, error.type=timeout) | ||
| | ||
HTTP request attempt 2 (trace=t1, span=s3) | ||
| | ||
-- connect(127.0.0.1:8080) - (trace=t1, span=s3) | ||
|
||
connection(127.0.0.1:8080) - (trace=t2, span=s4, link=t1:s3) | ||
``` | ||
|
||
[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
groups: | ||
- id: common_attributes.connection.client | ||
type: attribute_group | ||
brief: > | ||
Describes common client connections attributes | ||
attributes: | ||
- ref: network.peer.address | ||
- ref: network.peer.port | ||
- ref: server.address | ||
requirement_level: | ||
conditionally_required: if available without reverse DNS lookup | ||
- ref: error.type | ||
requirement_level: | ||
conditionally_required: If and only if a connection (attempt) ended with an error. | ||
note: > | ||
It's REQUIRED to document error types instrumentation produces. | ||
It's RECOMMENDED to use error codes provided by the socket library, runtime, or the OS | ||
(such as `connect` method error codes on [Linux or other POSIX systems](https://man7.org/linux/man-pages/man2/connect.2.html#ERRORS) or | ||
[Windows](https://docs.microsoft.com/windows/win32/api/winsock2/nf-winsock2-connect#return-value)). | ||
examples: ["econnreset", "econnrefused", "address_family_not_supported", "java.net.SocketException"] | ||
- ref: network.transport | ||
- ref: network.type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With http/3 over quick, the connection is virtual and may span multiple UDP packets. The client IP/Network may even change during the duration of the connection, for example switching between wifi and cellular when a mobile client is moved out of range.
Rather than tying this directly to a socket, the type can be tracked by an additional
type
property. This same concept can then be used for database, http and a range of scenarios, but with optional attributes based on the scenario.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http/3 still operates on top of UDP sockets.
I'm not an expert, but I believe from socket perspective we still have different connections established when QUIC connection migration happens, the only thing it saves is TLS handshake - it won't happen again during migration.
It's a good question how to represent QUIC logical connection, but given it's such a long lived thing, I don't see why we can't have a span for it and spans for all the underlying real socket connections it creates.