Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename db.collection.name to db.target.name #1527

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Contributor

Fixes #1491, #521

Changes

db.collection.name does not work well for all possible database objects such as stored procedures, user defined functions, triggers, users (e.g. when performing GRANT queries).

This PR renames it to db.target.name to cover all these cases

Merge requirement checklist

@lmolkova lmolkova requested review from a team as code owners October 29, 2024 03:01
schema-next.yaml Outdated Show resolved Hide resolved
model/database/spans.yaml Outdated Show resolved Hide resolved
model/database/common.yaml Outdated Show resolved Hide resolved
model/database/common.yaml Outdated Show resolved Hide resolved
@bogdandrutu
Copy link
Member

Few comments here:

  1. There may be multiple tables/functions (only one store procedure). Because of that I think we should make it a list of strings this argument.
  2. Ideally I would separate "storage" targets from "executions" targets. The reason is that the "storage" targets like tables will not have traces, but "executions" targets will have.

If you comment that if traces exists for other executions then the spans will be available, that is not the case in some systems, since the function/procedure are user-code and traces may be in a different system than the platform traces.

@cbandy
Copy link

cbandy commented Oct 31, 2024

  1. There may be multiple tables/functions (only one store procedure). Because of that I think we should make it a list of strings this argument.

The same can be said of db.namespace, perhaps.

My opinion is that deeper insights belong in more specific fields. If a database (engine/system) can or wants to indicate that the operation involves multiple tables, indices, tablets, pages, locks, what-have-you; then it can go in a tech-specific field.

@lmolkova
Copy link
Contributor Author

lmolkova commented Nov 4, 2024

@bogdandrutu

There may be multiple tables/functions (only one store procedure). Because of that I think we should make it a list of strings this argument.

Array will be problematic on metrics and will be significantly harder to use for noSQL databases which primarily operate on one collection. It also has much higher cardinality.

Having array of operations and array of collections will be even worse for usability.

That's why we introduced db.query.summary - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md#generating-a-summary-of-the-query-text which would contain something like INSERT shipping_details SELECT orders for query like

INSERT INTO shipping_details
            (order_id,
            address)
SELECT order_id,
       address
FROM   orders
WHERE  order_id = ?

and will be a better representation of a query than arrays of operations and tables.

Ideally I would separate "storage" targets from "executions" targets. The reason is that the "storage" targets like tables will not have traces, but "executions" targets will have.

I'm not sure I understand what the difference between storage and execution is. CREATE TABLE foo ... or CREATE USER user_name is traced in the same way as SELECT * from foo where .... Instrumentations are not even required to parse query text and will create spans in the same way for them.

If you comment that if traces exists for other executions then the spans will be available, that is not the case in some systems, since the function/procedure are user-code and traces may be in a different system than the platform traces.

We're only defining client DB conventions now.

@lmolkova
Copy link
Contributor Author

lmolkova commented Nov 4, 2024

@cbandy

The same can be said of db.namespace, perhaps.
My opinion is that deeper insights belong in more specific fields. If a database (engine/system) can or wants to indicate that the operation involves multiple tables, indices, tablets, pages, locks, what-have-you; then it can go in a tech-specific field.

Deeper insights are available on db.query.text. Query text is not available on metrics (by default) due to high-ish cardinality, so we introduced db.query.summary that contains perations and targets - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md#generating-a-summary-of-the-query-text

db.namespace (for SQL database) is defined as The database associated with the connection - you should not have more than one at the same time.

Cross-DB queries in common case would have fully-qualified table names, i.e. multiple database names will be captured as a part of each target name and inside db.query.text|summary. Generic instrumentation cannot really say if something in the table name represents an database or a schema or a part of a table name - in case that database supports . in the table name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Needs More Approval
Development

Successfully merging this pull request may close these issues.

[database] reporting stored procedures/prepared statements
5 participants