Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring SPEC for DB spans #420

Merged
merged 4 commits into from
Apr 29, 2021
Merged

Conversation

AlexanderWert
Copy link
Member

@AlexanderWert AlexanderWert commented Mar 3, 2021

Preview

Goals of refactoring:

  • broader list of relevant fields and corresponding description (including destination fields)
  • make semantics of fields more explicit in general and for individual technologies
  • concrete examples for different technologies
  • align agents and reduce / eliminate inconsistencies

@apmmachine
Copy link

apmmachine commented Mar 3, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #420 updated

  • Start Time: 2021-04-29T10:40:11.321+0000

  • Duration: 3 min 43 sec

  • Commit: 35110c8

Trends 🧪

Image of Build Times

@basepi
Copy link
Contributor

basepi commented Mar 4, 2021

I think this refactor is a great idea and it's making it immediately obvious where there are unknowns or inconsistencies.

One question: what do the 🔴 icons mean?

@trentm
Copy link
Member

trentm commented Mar 5, 2021

Is there any value in linking to https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/database.md for comparison or perhaps for inspiration?

Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a huge improvement, generally. I'm curious if apm-server devs would think this adds a maint burden. As you said, we don't expect these specs to change much, so perhaps no real burden added here.

specs/agents/tracing-instrumentation-db.md Outdated Show resolved Hide resolved
specs/agents/tracing-instrumentation-db.md Outdated Show resolved Hide resolved
specs/agents/tracing-instrumentation-db.md Show resolved Hide resolved
specs/agents/tracing-instrumentation-db.md Outdated Show resolved Hide resolved
specs/agents/tracing-instrumentation-db.md Outdated Show resolved Hide resolved
specs/agents/tracing-instrumentation-db.md Show resolved Hide resolved
@axw
Copy link
Member

axw commented Mar 8, 2021

@trentm why did you think it might be a maintenance burden for the server devs? I can't see it being one, but I'm not sure how could be one either.

@trentm
Copy link
Member

trentm commented Mar 8, 2021

why did you think it might be a maintenance burden for the server devs? I can't see it being one, but I'm not sure how could be one either.

@axw In the sense that if we added or changed semantics of context.db.* fields, then there would be a maintenance task to update info in this document. However, that would not be a requirement on apm-server dev, so there is no concern. Thanks.

@cyrille-leclerc
Copy link

cyrille-leclerc commented Mar 10, 2021

As discussed with @AlexanderWert ,

  • Can we verify we are aligned with infrastructure monitoring to be successful at correlating a database span with the instrumentation of a database by "metricbeat/filebeat"?
  • Can we document the mapping with ECS and explain our path forward to be consistent with ECS? This mapping will be needed for the storage of the span documents in Elasticsearch

@AlexanderWert AlexanderWert marked this pull request as ready for review April 27, 2021 08:39
@AlexanderWert AlexanderWert requested review from a team as code owners April 27, 2021 08:39
@AlexanderWert AlexanderWert marked this pull request as draft April 27, 2021 08:52
Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good to merge this as-is. It "just" documents the status quo. Any additions can and should be handled in follow-up PRs.

specs/agents/tracing-instrumentation-db.md Show resolved Hide resolved
@SylvainJuge
Copy link
Member

Should we take that opportunity to align span naming conventions ?

Some of them include the service/technology, when it duplicates the type / subtype combination. From the provided examples:

  • Elasticsearch: GET ... with the Elasticsearch: prefix with db / elasticsearch
  • DynamoDB ... for DynamoDB with db / dynamodb
  • S3 GetObject my-bucket for S3 with storage / s3
  • No prefix for MongoDB with db / mongodb
  • No prefix for Redis with db / redis
  • No prefix for relational databases with db / <db vendor name>

Unless we have a good reason, I think that removing the prefixes and align with what is done with relational databases is better.
That might be a breaking change, but we are not consistent thus going either way is breaking something.

On the UI side, we should have a way to present the spans by using only the type / subtype, which should remove the need to rely on such prefixes. If we had such visual indicators on spans (labels, icons, ...) keeping prefixes in the name would create duplication, that would be a nice way to leverage this spec & alignment across agents.

@felixbarny
Copy link
Member

+1 on the suggestion but I really think we should separate documenting the status quo from making actual changes.

@AlexanderWert AlexanderWert marked this pull request as ready for review April 29, 2021 10:31
@AlexanderWert AlexanderWert merged commit bcec2df into elastic:master Apr 29, 2021
Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I realize I am late and this is after the fact.

|`action`| `request` |
| __**context.db._**__ |<hr/>|<hr/>|
|`_.instance`| :heavy_minus_sign: |
|`_.statement`| e.g. <pre lang="json">{"query": {"match": {"user.id": "kimchy"}}}</pre> | For Elasticsearch search-type queries, the request body may be recorded. Alternatively, if a query is specified in HTTP query parameters, that may be used instead. If the body is gzip-encoded, the body should be decoded first.|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Node.js (and I believe Python, because I copied from there) agents use a format that can include either or both of (a) query parms (in URL query encoded form) and (b) the request body (JSON) separated by two newlines:

https://github.com/elastic/apm-agent-nodejs/blob/v3.14.0/lib/instrumentation/elasticsearch-shared.js#L21-L28
https://github.com/elastic/apm-agent-python/blob/master/elasticapm/instrumentation/packages/elasticsearch.py#L65-L78

Is that a form worth codifying?

Dale, who is using this field from Kibana APM traces for Elasticsearch performance work, brought up this discussion ticket to reconsider this format.

In general, with Elasticsearch, you need to consider the URL path, the query params, and the request body to fully understand the query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are just documenting the status quo here, then perhaps saying "that may be used as well" rather than "that may be used instead" is more accurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be something that's specific to the Node.js and Python agents and that it can and probably should change in the future. Personally, I think it'd make more sense to use context.http.* to store the URL and the query parameters than to add it to db.statement. Ralley could then have a condition on whether the db.statement starts with {.

Comment on lines +74 to +80
| __**context.db._**__ |<hr/>|<hr/>|
|`_.instance`| e.g. `us-east-1` | The AWS region where the table is. |
|`_.statement`| :heavy_minus_sign: | |
|`_.type`|`dynamodb`|
|`_.user`| :heavy_minus_sign: |
|`_.link`| :heavy_minus_sign: |
|`_.rows_affected`| :heavy_minus_sign: |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want "db" fields for S3? If so... these values look like copypasta from DynamoDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.