Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry manifest and Schema diff #400

Merged
merged 70 commits into from
Feb 4, 2025

Conversation

lquerel
Copy link
Contributor

@lquerel lquerel commented Oct 3, 2024

Note: The scope of this PR has been reduced to focus only focus on the schema diff feature. Github issues have been created to track the features that have been postponed #482, open-telemetry/semantic-conventions#1938.

This PR implements the command registry diff, see the following example:

cargo run -- registry diff -r https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.27.0.zip[model] --baseline-registry https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.26.0.zip[model] --diff-format markdown

In this example, the diff is displayed in markdown format. The following formats are supported: json, markdown, ansi, ansi_stats. YAML format will be supported once PR #525 is finalized.

A detailed description of the schema diff data model and the diffing process is visible here.

Notes:

  • The crate weaver_otel_schema is not essential for this PR; it was initially included as part of the preparations for the registry schema-update command. We have decided to implement this command in a future PR. However, for simplicity, I prefer to keep the preparation code in place instead of removing it. Same thing for all_changes in weaver_version.

List of modifications to apply to the semantic conventions repository after the release of the Weaver containing the current PR:

  • Add a registry-manifest.yaml file with the version of the next release.
  • Update all deprecated fields.

Closes: #186

The following command comparing the versions 1.29 and 1.30

/weaver registry diff -r 'https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.30.0.zip[model]' --baseline-registry 'https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.29.0.zip[model]' --diff-format markdown

produces the following markdown output:

Registry Attributes

New registry_attributes:

  • Add aws.extended_request_id
  • Add azure.client.id
  • Add azure.cosmosdb.connection.mode
  • Add azure.cosmosdb.consistency.level
  • Add azure.cosmosdb.operation.contacted_regions
  • Add azure.cosmosdb.operation.request_charge
  • Add azure.cosmosdb.request.body.size
  • Add azure.cosmosdb.response.sub_status_code
  • Add cassandra.consistency.level
  • Add cassandra.coordinator.dc
  • Add cassandra.coordinator.id
  • Add cassandra.page.size
  • Add cassandra.query.idempotent
  • Add cassandra.speculative_execution.count
  • Add cicd.pipeline.result
  • Add cicd.pipeline.run.state
  • Add cicd.system.component
  • Add cicd.worker.state
  • Add code.column.number
  • Add code.file.path
  • Add code.function.name
  • Add code.line.number
  • Add db.system.name
  • Add elasticsearch.node.name
  • Add gen_ai.request.seed
  • Add k8s.namespace.phase
  • Add network.connection.state
  • Add security_rule.category
  • Add security_rule.description
  • Add security_rule.license
  • Add security_rule.name
  • Add security_rule.reference
  • Add security_rule.ruleset.name
  • Add security_rule.uuid
  • Add security_rule.version
  • Add vcs.repository.name

Deprecated registry_attributes:

  • code.column (Note: Deprecated, use code.column.number)
  • code.function (Note: Deprecated, use code.function.name instead)
  • code.lineno (Note: Deprecated, use code.line.number instead)
  • db.cassandra.consistency_level (Note: Deprecated, use cassandra.consistency.level instead.)
  • db.cassandra.coordinator.dc (Note: Deprecated, use cassandra.coordinator.dc instead.)
  • db.cassandra.coordinator.id (Note: Deprecated, use cassandra.coordinator.id instead.)
  • db.cassandra.idempotence (Note: Deprecated, use cassandra.query.idempotent instead.)
  • db.cassandra.page_size (Note: Deprecated, use cassandra.page.size instead.)
  • db.cassandra.speculative_execution_count (Note: Deprecated, use cassandra.speculative_execution.count instead.)
  • db.cosmosdb.client_id (Note: Deprecated, use azure.client.id instead.)
  • db.cosmosdb.connection_mode (Note: Deprecated, use azure.cosmosdb.connection.mode instead.)
  • db.cosmosdb.consistency_level (Note: Deprecated, use cosmosdb.consistency.level instead.)
  • db.cosmosdb.regions_contacted (Note: Deprecated, use azure.cosmosdb.operation.contacted_regions instead.)
  • db.cosmosdb.request_charge (Note: Deprecated, use azure.cosmosdb.operation.request_charge instead.)
  • db.cosmosdb.request_content_length (Note: Deprecated, use azure.cosmosdb.request.body.size instead.)
  • db.cosmosdb.sub_status_code (Note: Deprecated, use azure.cosmosdb.response.sub_status_code instead.)
  • db.elasticsearch.node.name (Note: Deprecated, use elasticsearch.node.name instead.)
  • db.elasticsearch.path_parts (Note: Deprecated, use db.operation.parameter instead.)
  • db.system (Note: Deprecated, use db.system.name instead.)
  • event.name (Note: Identifies the class / type of event.)
  • exception.escaped (Note: Indicates that the exception is escaping the scope of the span.)
  • gen_ai.openai.request.seed (Note: Deprecated, use gen_ai.request.seed.)
  • system.network.state (Note: Deprecated, use network.connection.state instead.)

Metrics

New metrics:

  • Add metric.azure.cosmosdb.client.active_instance.count
  • Add metric.azure.cosmosdb.client.operation.request_charge
  • Add metric.cicd.pipeline.run.active
  • Add metric.cicd.pipeline.run.duration
  • Add metric.cicd.pipeline.run.errors
  • Add metric.cicd.system.errors
  • Add metric.cicd.worker.count
  • Add metric.k8s.cronjob.active_jobs
  • Add metric.k8s.daemonset.current_scheduled_nodes
  • Add metric.k8s.daemonset.desired_scheduled_nodes
  • Add metric.k8s.daemonset.misscheduled_nodes
  • Add metric.k8s.daemonset.ready_nodes
  • Add metric.k8s.deployment.available_pods
  • Add metric.k8s.deployment.desired_pods
  • Add metric.k8s.hpa.current_pods
  • Add metric.k8s.hpa.desired_pods
  • Add metric.k8s.hpa.max_pods
  • Add metric.k8s.hpa.min_pods
  • Add metric.k8s.job.active_pods
  • Add metric.k8s.job.desired_successful_pods
  • Add metric.k8s.job.failed_pods
  • Add metric.k8s.job.max_parallel_pods
  • Add metric.k8s.job.successful_pods
  • Add metric.k8s.namespace.phase
  • Add metric.k8s.replicaset.available_pods
  • Add metric.k8s.replicaset.desired_pods
  • Add metric.k8s.replication_controller.available_pods
  • Add metric.k8s.replication_controller.desired_pods
  • Add metric.k8s.statefulset.current_pods
  • Add metric.k8s.statefulset.desired_pods
  • Add metric.k8s.statefulset.ready_pods
  • Add metric.k8s.statefulset.updated_pods
  • Add metric.vcs.change.time_to_merge

Deprecated metrics:

  • metric.db.client.cosmosdb.active_instance.count (Note: Deprecated)
  • metric.db.client.cosmosdb.operation.request_charge (Note: Deprecated)

Spans

New spans:

  • Add span.azure.cosmosdb.client

@lquerel lquerel self-assigned this Oct 3, 2024
@lquerel lquerel added the enhancement New feature or request label Oct 3, 2024
@lquerel lquerel changed the title [WIP] Registry manifest and OTEL schema update [WIP] Registry manifest and Schema diff Nov 27, 2024
# Conflicts:
#	.clippy.toml
#	Cargo.toml
#	crates/weaver_semconv_gen/src/lib.rs
#	src/registry/search.rs
#	src/registry/stats.rs
#	src/registry/update_markdown.rs
},
/// A top-level telemetry object from the baseline registry was marked as deprecated in the head
/// registry.
Deprecated {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this its own change or should it be attached to other changes?

I.e. is it just an implication of change?

I think this was called out verbally, but it's the one I'm least sure of belonging with other "semantic" changes, especialyl given "uncategorized" as an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded. Once updated, this PR LGTM.

Structurally/rust-wise you have all the pieces I'd look for. It's just naming/surface syntax at this point.

- `renamed`: A top-level telemetry object from the baseline registry was renamed in the head registry.
- `deprecated`: A top-level telemetry object from the baseline registry was marked as deprecated in the head registry.
- `updated`: One or more fields in a top-level telemetry object have been updated in the head registry.
- `removed`: A top-level telemetry object from the baseline registry was removed in the head registry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For semconv specifically, we definitely don't want to allow this, instead we deprecate.

Also - my concern with "deprecated" is that when we rename, we're efecctively deprecating the old.

I'm reading this and think "deprecated" is too generic and too much of a catch-all. I'd rather use "uncategorized", where deprecation is a consequence of the change vs. the change itself.

I.e. we almost need a "removed" where we mark the type as deprecated and prevent further usage but don't remove our knowledge it once existed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecated type is indeed probably too much of a catch-all. However, I believe these three types are truly distinct, and I probably didn’t do a great job explaining them.

Currently, the general concept of deprecation is used for several types of changes in semantic conventions (renaming, “soft” removal, and other exotic changes). I propose refining my initial suggestion and the corresponding definitions as follows:

  • Rename the change type deprecated to obsoleted to clearly indicate that this change corresponds to an attribute or a signal that is discontinued without a valid replacement.
  • In my view, removed should exist at the Weaver level, if only to identify that there has been an actual deletion in a registry under validation. This type of change should never be issued for a published registry, but it is clearly a transitional change that can occur during the development of a registry. We could even build a policy leveraging this type of change in the future.
  • uncategorized is the catch-all change type representing all complex types of changes that we haven’t precisely codified. The idea of this type, as you mentioned during the meeting, is that we should gradually eliminate it from the registry.

Do we agree on this definition of things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like moving deprecated to obsoleted where deprecated can remain as a catch-all for "we changed this thing in a way" and obsoleted implies "do not use anymore, here for legacy reasons".

I agree we need to actually model removed in some way. obsoleted as soft-delete works for me.

So yes, I agree on this.

@lquerel lquerel requested a review from a team as a code owner January 31, 2025 02:07
Copy link
Contributor

@jerbly jerbly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few doc nits. Plus a schema suggestion to avoid obscuring the original brief for the item.

@@ -41,6 +41,48 @@
}
},
"$defs": {
"Deprecated": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please also update https://github.com/open-telemetry/weaver/blob/main/schemas/semconv-syntax.md - an informal and reader-friendly version of it?

registry_attributes:
- name: http.server_name # attribute name
type: obsoleted # change type
note: This attribute is deprecated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the note populated from note or brief?

Copy link
Contributor Author

@lquerel lquerel Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova @jerbly Currently, the note in the schema change event is directly derived from the brief field of the telemetry object. Alternatively, we could choose to use the note field of the telemetry object. However, what we cannot do is allow users to fill in either one, as Weaver would then not know which one to use.

One possible solution/extension, however, would be to introduce a note field within the deprecated field (in the semantic conventions). If this field is populated, it would be used to fill the note field in the schema change event. If the deprecated.note field is not present, Weaver could then fall back to brief.

For this PR, I propose using brief for now and introducing the deprecated.note field if this approach works for you. I will make this change in another PR if needed.

@lquerel lquerel enabled auto-merge (squash) February 4, 2025 07:26
@lquerel lquerel merged commit 5eaa384 into open-telemetry:main Feb 4, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Automate OTEL Schema Generation and Update Process with Migration Guide Support
4 participants