Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new dimension to internal metrics to identify mode #1634

Merged
merged 9 commits into from
Jan 30, 2025

Conversation

technimad-splunk
Copy link
Contributor

@technimad-splunk technimad-splunk commented Jan 28, 2025

Description:

To easily filter between agent and gateway deployments of the collector in dashboards a proposed new dimension otelcol.service.mode to be added to the internal metrics pipeline.

Testing:
Deployed the chart to a cluster with the gateway enabled. Validated on the OOTB otel collector dashboard I can filter on otelcol.service.mode with both a value for agent and gateway
Deployed the chart to a cluster with the gateway disabled. Validated on the OOTB otel collector dashboard I can filter on otelcol.service.mode with only a value for agent

This feature is already merged into the splunk otel-collector: signalfx/splunk-otel-collector#5701

@technimad-splunk technimad-splunk requested review from a team as code owners January 28, 2025 16:56
@jvoravong
Copy link
Contributor

While the end result of this PR should be usable for both myself and users, I recommend we consider a different implementation for this feature to ensure better support for previous, current, and future support

Suggested Plan:

  • Update Default Collector Pod Labels: We should first update the default labels applied to the agent, cluster receiver, and gateway. These updates must be implemented in a non-breaking way. If breaking changes are necessary, migration notes should be provided to guide users through the changes.

Current Labels Example for Different Modes:

  • Agent
  • Cluster Receiver
  • Gateway
    • Note: The label component: otel-collector should be updated to component: otel-collector-gateway
    • Use K8s Attribute Processor: Once the labels are updated, we can leverage the Kubernetes attribute processor to use the component label as a resource_attribute. This will help in identifying internal and configurable collector telemetry data. Here's an example configuration:

Then you you can use the k8sattributes processor to read the component label on the pods and apply them to telemetry data (like the internal collector metrics)

k8sattributes:
  extract:
    labels:
      - key: component

This approach will enhance flexibility and ensure compatibility with future updates. Let me know your thoughts.

@jvoravong
Copy link
Contributor

Just to get this PR merged, we can go the simple route and just use the resource/add_mode processor the way this PR currently uses it with a couple small adjustments listed below.

  • Make sure to run make render to get the pre-commit changes to pass and update our examples with your changes.
  • We need to add a resource/add_mode processor for the agent, cluster receiver, and gateway modes. Please add support for the cluster receiver.

@jinja2
Copy link
Collaborator

jinja2 commented Jan 29, 2025

FYi, these attrs are in-line with what we have in non-k8s configs and I think we should stick with these to reduce confusion for users with k8s/host mixed envs (PR for ref). But agree that we should add this to the k8s cluster receiver (we also document it as a different mode for k8s). The value should be clusterReceiver.

@pszkamruk-splunk
Copy link
Contributor

It would be also good to update existing tests and add a check for this newly added attribute
here are tests which are checking metrics and logs attributes: attributes tests

@jvoravong
Copy link
Contributor

You'll also have to add a changlog entry since the templates were updated and this is an enhancement feature.

Copy link
Contributor

@jvoravong jvoravong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last nit and then this PR LGTM!

@technimad-splunk
Copy link
Contributor Author

It would be also good to update existing tests and add a check for this newly added attribute here are tests which are checking metrics and logs attributes: attributes tests

I'm working on getting the tests to run

@jvoravong
Copy link
Contributor

This PR is low risk, let's not worrying about the extra functional tests that run on eks, gke, and aks for this PR. We should make sure they pass in the future though.

@jvoravong jvoravong merged commit 2f4ca14 into signalfx:main Jan 30, 2025
65 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants