elastic · AlexanderWert · Apr 29, 2021 · Mar 3, 2021 · Apr 29, 2021 · Apr 29, 2021
diff --git a/specs/agents/tracing-instrumentation-aws.md b/specs/agents/tracing-instrumentation-aws.md
@@ -8,38 +8,12 @@ Some of the services can use existing specs. When there are differences or addit
 AWS Simple Storage Service offers object storage via a REST API. The objects are organized into buckets, which are 
 themselves organized into regions.
 
-- `span.name`: The span name should follow this pattern: `S3 <OperationName> <bucket-name>`. For example,
-`S3 GetObject my-bucket`. Note that the operation name is in CamelCase.
-- `span.type`: `storage`
-- `span.subtype`: `s3`
-- `span.action`: The operation name in CamelCase. For example ‘GetObject’.
-
-#### Span context fields
-
-- **`context.destination.address`**: optional. Not available in some cases. Only set if the actual connection is available.
-- **`context.destination.port`**: optional. Not available in some cases. Only set if the actual connection is available.
-- **`context.destination.cloud.region`**: mandatory. The AWS region where the bucket is.
-- **`context.destination.service.name`**: mandatory. Use `s3`
-- **`context.destination.service.resource`**: optional. The bucket name, if available. The s3 API allows either the
-bucket name or an Access Point to be provided when referring to a bucket. Access Points can use either slashes or colons.
-When an Access Point is provided, the access point name preceded by `accesspoint/` or `accesspoint:` should be extracted.
-For example, given an Access Point such as `arn:aws:s3:us-west-2:123456789012:accesspoint/myendpointslashes`, the agent
-extracts `accesspoint/myendpointslashes`. Given an Access Point such as
-`arn:aws:s3:us-west-2:123456789012:accesspoint:myendpointcolons`, the agent extracts `accesspoint:myendpointcolons`.
-- **`context.destination.service.type`**: mandatory. Use `storage`.
+Field semantics and values for S3 are defined in the [S3 table within the database spec](tracing-instrumentation-db.md#aws-s3).
 
 ### DynamoDB
 
 AWS DynamoDB is a document database so instrumenting it will follow the [db spec](tracing-instrumentation-db.md).
-The follow specifications supersede those of the db spec.
-
-- **`span.name`**: The span name should capture the operation name in CamelCase and the table name, if available.
-The format should be `DynamoDB <ActionName> <TableName>`. So for example, `DynamoDB UpdateItem my_table`.
-
-#### Span context fields
-- **`context.db.instance`**: mandatory. The AWS region where the table is.
-- **`context.db.statement`**: optional. For a DynamoDB `Query` operation, capture the `KeyConditionExpression` in this field.
-- **`context.destination.cloud.region`**: mandatory. The AWS region where the table is, if available.
+DynamoDB-specific specifications that supercede generic db field semantics are defined in the [DynamoDB table within the database spec](tracing-instrumentation-db.md#aws-dynamodb).
 
 ### SQS (Simple Queue Service)
 

diff --git a/specs/agents/tracing-instrumentation-db.md b/specs/agents/tracing-instrumentation-db.md
@@ -1,34 +1,178 @@
-## Database spans
 
-We capture spans for various types of database/data-stores operations, such as SQL queries, Elasticsearch queries, Redis commands, etc. We follow some of the same conventions defined by OpenTracing for capturing database-specific span context, including:
+## Table of Contents
+* [Database and Datastore spans](#database-and-datastore-spans)
+* [Specific Databases](#specific-databases)
+  * [AWS DynamoDb](#aws-dynamodb)
+  * [AWS S3](#aws-s3)
+  * [Elasticsearch](#elasticsearch)
+  * [MongoDB](#mongodb)
+  * [Redis](#redis)
+  * [SQL Databases](#sql-databases)
 
- - `db.instance`: database instance name, e.g. "customers". For DynamoDB, this is the region.
- - `db.statement`: statement/query, e.g. "SELECT * FROM foo"
- - `db.user`: username used for database access, e.g. "readonly_user"
- - `db.type`: database type/category, which should be "sql" for SQL databases, and the lower-cased database name otherwise.
+## Database and Datastore spans
 
-The full database statement should be stored in `db.statement`, which may be useful for debugging performance issues. We store up to 10000 Unicode characters per database statement.
+We capture spans for various types of database/data-stores operations, such as SQL queries, Elasticsearch queries, is commands, etc. 
+Database and datastore spans **must not have child spans that have a different `type` or `subtype`** within the same transaction (see [span-spec](tracing-spans.md)).
 
-For SQL databases this will be the full SQL statement.
+The following fields are relevant for database and datastore spans. Where possible, agents should provide information for as many as possible of these fields. The semantics of and concrete values for these fields may vary between different technologies. See sections below for details on specific technologies.
 
-For MongoDB, this can be set to the command encoded as MongoDB Extended JSON.
+| Field | Description | Mandatory |
+|-------|-------------|:---------:|
+|`name`| The name of the exit database span. **The span name must have a low cardinality as it is used as a dimension for derived metrics!** Therefore, for SQL operations we perform a limited parsing of the statement, and extract the operation name and outer-most table involved. Other databases and storages may have different strategies for the span name (see specific databases and stores in the sections below).| :white_check_mark: |
+|`type`|For database spans, the type should be `db`.| :white_check_mark:|
+|`subtype`|For database spans, the subtype should be the database vendor name. See details below for specific databases.| :x: |
+|`action`|The database action, e.g. `query`|
+| <hr/> |<hr/>|<hr/>|
+|`context.db.instance`| Database instance name, e.g. "customers". For DynamoDB, this is the region.| :x: |
+|`context.db.statement`| Statement/query, e.g. `SELECT * FROM foo WHERE ...`. The full database statement should be stored in db.statement, which may be useful for debugging performance issues. We store up to 10000 Unicode characters per database statement. For Non-SQL data stores see details below.| :x: |
+|`context.db.type`| Database type/category, which should be "sql" for SQL databases, and the lower-cased database name otherwise.| :x: |
+|`context.db.user`| Username used for database access, e.g. `readonly_user`| :x: |
+|`context.db.link`| Some SQL databases (e.g. Oracle) provide a feature for linking multiple databases to form a single logical database. The DB link differentiates single DBs of a logical database. See https://github.com/elastic/apm/issues/107 for more details. | :x: |
+|`context.db.rows_affected`| The number of rows / entities affected by the corresponding db statement / query.| :x: |
+| <hr/> |<hr/>|<hr/>|
+|`context.destination.address`|The hostname / address of the database.| :x: |
+|`context.destination.port`|The port under which the database is accessible.| :x: |
+|`context.destination.service.name`| The `destination.service.name` is used to denote "sameness" of the service. E.g. multiple instances of Oracle databases have all the same name `oracle`. For databases and storages the same value as for the `span.subtype` should be used.| :white_check_mark:|
+|`context.destination.service.type`| Should be the same as the `span.type`. Value: `db`| :white_check_mark:|
+|`context.destination.service.resource`|  Used to detect unique destinations from each service. This field should contain all information that is needed to differentiate different database / storage instances (e.g. in the service map). See details below on how to set this field for specific technologies.| :white_check_mark:|
+|`context.destination.cloud.region`| The cloud region in case the datastore is hosted in a public cloud or is a managed datasatore / database. E.g. AWS regions, such as `us-east-1` | :x: |
 
-For Elasticsearch search-type queries, the request body may be recorded. Alternatively, if a query is specified in HTTP query parameters, that may be used instead. If the body is gzip-encoded, the body should be decoded first.
 
-### Database span names
+## Specific Databases
 
-For SQL operations we perform a limited parsing the statement, and extract the operation name and outer-most table involved (if any). See more details here: https://docs.google.com/document/d/1sblkAP1NHqk4MtloUta7tXjDuI_l64sT2ZQ_UFHuytA.
+### AWS DynamoDb
 
-For Redis, the the span name can simply be set to the command name, e.g. `GET` or `LRANGE`.
+| Field | Value / Examples | Comments |
+|-------|:---------------:|----------|
+|`name`| e.g. `DynamoDB UpdateItem my_table`|  The span name should capture the operation name (as used by AWS for the action name) and the table name, if available. The format should be `DynamoDB <ActionName> <TableName>`. |
+|`type`|`db`|
+|`subtype`|`dynamodb`|
+|`action`| `query` | 
+| __**context.db._**__ |<hr/>|<hr/>|
+|`_.instance`| e.g. `us-east-1` | The AWS region where the table is. |
+|`_.statement`| e.g. `ForumName = :name and Subject = :sub` | For a DynamoDB Query operation, capture the KeyConditionExpression in this field. |
+|`_.type`|`dynamodb`|
+|`_.user`| :heavy_minus_sign: |
+|`_.link`| :heavy_minus_sign: |
+|`_.rows_affected`| :heavy_minus_sign: |
+| __**context.destination._**__ |<hr/>|<hr/>|
+|`_.address`|e.g. `dynamodb.us-west-2.amazonaws.com`|
+|`_.port`|e.g. `5432`|
+|`_.service.name`| `dynamodb` |
+|`_.service.type`|`db`|
+|`_.service.resource`| `dynamodb` |
+|`_.cloud.region`| e.g. `us-east-1` | The AWS region where the table is, if available. |
+
+### AWS S3
 
-For MongoDB, the span name should be the command name in the context of its collection/database, e.g. `users.find`.
+| Field | Value / Examples | Comments |
+|-------|:---------------:|----------|
+|`name`| e.g. `S3 GetObject my-bucket`|  The span name should follow this pattern: `S3 <OperationName> <bucket-name>.` Note that the operation name is in PascalCase. |
+|`type`|`storage`|
+|`subtype`|`s3`|
+|`action`| e.g. `GetObject` | The operation name in PascalCase. |
+| __**context.db._**__  |<hr/>|<hr/>|
+|`_.instance`| e.g. `us-east-1` | The AWS region where the table is. |
+|`_.statement`| :heavy_minus_sign: |  |
+|`_.type`|`dynamodb`|
+|`_.user`| :heavy_minus_sign: |
+|`_.link`| :heavy_minus_sign: |
+|`_.rows_affected`| :heavy_minus_sign: |
+| __**context.destination._**__ |<hr/>|<hr/>|
+|`_.address`|e.g. `dynamodb.us-west-2.amazonaws.com`| Not available in some cases. Only set if the actual connection is available. |
+|`_.port`|e.g. `5432`| Not available in some cases. Only set if the actual connection is available. |
+|`_.service.name`| `s3` |
+|`_.service.type`|`storage`|
+|`_.service.resource`| e.g. `accesspoint/myendpointslashes` or  `accesspoint:myendpointcolons`| The bucket name, if available. The s3 API allows either the bucket name or an Access Point to be provided when referring to a bucket. Access Points can use either slashes or colons. When an Access Point is provided, the access point name preceded by accesspoint/ or accesspoint: should be extracted. For example, given an Access Point such as `arn:aws:s3:us-west-2:123456789012:accesspoint/myendpointslashes`, the agent extracts `accesspoint/myendpointslashes`. Given an Access Point such as `arn:aws:s3:us-west-2:123456789012:accesspoint:myendpointcolons`, the agent extracts `accesspoint:myendpointcolons`. |
+|`_.cloud.region`| e.g. `us-east-1` | The AWS region where the bucket is. |
 
-For Elasticsearch, the span name should be `Elasticsearch: <method> <path>`, e.g.
-`Elasticsearch: GET /index/_search`.
+### Elasticsearch
 
-### Database span type/subtype
+| Field | Value / Examples | Comments |
+|-------|:---------------:|----------|
+|`name`| e.g. `Elasticsearch: GET /index/_search` |  The span name should be `Elasticsearch: <method> <path>` |
+|`type`|`db`|
+|`subtype`|`elasticsearch`|
+|`action`| `request` |
+| __**context.db._**__  |<hr/>|<hr/>|
+|`_.instance`| :heavy_minus_sign: |
+|`_.statement`| e.g. <pre lang="json">{"query": {"match": {"user.id": "kimchy"}}}</pre> | For Elasticsearch search-type queries, the request body may be recorded. Alternatively, if a query is specified in HTTP query parameters, that may be used instead. If the body is gzip-encoded, the body should be decoded first.|
+|`_.type`|`elasticsearch`|
+|`_.user`| :heavy_minus_sign: |
+|`_.link`| :heavy_minus_sign: |
+|`_.rows_affected`| :heavy_minus_sign: |
+| __**context.destination._**__ |<hr/>|<hr/>|
+|`_.address`|e.g. `localhost`|
+|`_.port`|e.g. `5432`|
+|`_.service.name`| `elasticsearch` |
+|`_.service.type`|`db`|
+|`_.service.resource`| `elasticsearch` |
 
-For database spans, the type should be `db` and subtype should be the database name. Agents should standardise on the following span subtypes:
+### MongoDB
 
-- `postgresql` (PostgreSQL)
-- `mysql` (MySQL)
+| Field | Value / Examples | Comments |
+|-------|:---------------:|----------|
+|`name`| e.g. `users.find` |  The name for MongoDB spans should be the command name in the context of its collection/database. |
+|`type`|`db`|
+|`subtype`|`mongodb`|
+|`action`|e.g. `find` , `insert`, etc.| The MongoDB command executed with this action. |
+| __**context.db._**__  |<hr/>|<hr/>|
+|`_.instance`| :heavy_minus_sign: |
+|`_.statement`| e.g. <pre lang="json">find({status: {$in: ["A","D"]}})</pre> | The MongoDB command encoded as MongoDB Extended JSON.|
+|`_.type`|`mongodb`|
+|`_.user`| :heavy_minus_sign: |
+|`_.link`| :heavy_minus_sign: |
+|`_.rows_affected`| :heavy_minus_sign: |
+| __**context.destination._**__ |<hr/>|<hr/>|
+|`_.address`|e.g. `localhost`|
+|`_.port`|e.g. `5432`|
+|`_.service.name`| `mongodb` |
+|`_.service.type`|`db`|
+|`_.service.resource`| `mongodb` |
+
+### Redis
+
+| Field | Value / Examples | Comments |
+|-------|:---------------:|----------|
+|`name`| e.g. `GET` or `LRANGE` |  The name for Redis spans can simply be set to the command name. |
+|`type`|`db`|
+|`subtype`|`redis`|
+|`action`| `query` | 
+| __**context.db._**__  |<hr/>|<hr/>|
+|`_.instance`| :heavy_minus_sign: |
+|`_.statement`|  :heavy_minus_sign: | 
+|`_.type`|`redis`|
+|`_.user`| :heavy_minus_sign: |
+|`_.link`| :heavy_minus_sign: |
+|`_.rows_affected`| :heavy_minus_sign: |
+| __**context.destination._**__ |<hr/>|<hr/>|
+|`_.address`|e.g. `localhost`|
+|`_.port`|e.g. `5432`|
+|`_.service.name`| `redis` |
+|`_.service.type`|`db`|
+|`_.service.resource`| `redis` |
+
+### SQL Databases 
+
+| Field | Common values / patterns for all SQL DBs | Comments |
+|-------|:---------------:|---------------|
+|`name`| e.g. `SELECT FROM products` | For SQL operations we perform a limited parsing the statement, and extract the operation name and outer-most table involved (if any). See more details [here](https://docs.google.com/document/d/1sblkAP1NHqk4MtloUta7tXjDuI_l64sT2ZQ_UFHuytA). |
+|`type`|`db`|
+|`action`|`query`|
+| __**context.db._**__  |<hr/>|<hr/>|
+|`_.instance`| e.g. `instance-name`| Use instance concept for [Oracle DB instances](https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT005) and [MS SQL instances](https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/database-engine-instances-sql-server?view=sql-server-ver15). For Oracle the instance name should be the same as retrieved through `SELECT sys_context('USERENV','INSTANCE_NAME') AS Instance`.
+|`_.statement`| e.g. `SELECT * FROM products WHERE ...`| The full SQL statement. We store up to 10000 Unicode characters per database statement.  |
+|`_.type`|`sql`|
+|`_.user`| e.g. `readonly_user`|
+|`_.rows_affected`| e.g. `123`|
+| __**context.destination._**__ |<hr/>|<hr/>|
+|`_.address`|e.g. `localhost`|
+|`_.port`|e.g. `5432`|
+|`_.service.type`|`db`|
+
+| Field | MySQL | PostgreSQL | MS SQL | Oracle | MariaDB | IBM Db2 |
+|-------|:-----:|:----------:|:------:|:------:|:-------:|:-------:|
+|`subtype`|`mysql`| `postgresql` | `sqlserver` | `oracle` |  `mariadb` | `db2` |
+| __**context.destination._**__ |<hr/>|<hr/>|<hr/>|<hr/> |<hr/>|<hr/>|
+|`_.service.name`| `mysql` | `postgresql` | `sqlserver` | `oracle` |  `mariadb` | `db2` |
+|`_.service.resource` | `mysql` | `postgresql` | `sqlserver` | `oracle` |`mariadb` | `db2` |