Skip to content

Conversation

grcevski
Copy link
Contributor

This PR resolves multiple edge cases with our tracking of target_info:

  1. In certain scenarios we can receive multiple events for a given process, for example a single container which has multiple instrumented processes, but they are built with different programming languages. One example of such container is the load-generator in the OTel DEMO. It has a python, nodejs and generic (c++/rust) application, all within the same container. Until now, in these scenarios we'd leave stale target_info information for 2 out of the 3 processes, since the labels are recorded as last one wins. The code now checks to see if we are receiving duplicate labels for the same service and correctly removes the old info before creating new information.
  2. Another edge-case has to do with delayed container creation. A customer may configure OBI to instrument all Kubernetes services that are listening on a certain port, e.g. 8080. In this case, we might match the process information before we receive the Kubernetes metadata. Since we have no metadata yet, we'll record the process in target_info as "naked" -- undecorated process. Later once the Kubernetes metadata arrives, the spans will start reporting service names based on the updated information, but target_info will never change. I've added a subscriber to the Kubernetes store metadata in the process event Kubernetes generator, which pushes a new process created event once the updated metadata arrives. Since I've also fixed 1., this nicely updates the information and the target_info is now aligned with what's reported with the spans.
  3. Finally, I found that the store.go code also had the assumption that there can only be one PID per container. What this would do is that if we launched 3 processes, we'll update the information in the store based on the last process. However, if that last process terminates (but the other 2 are alive), we'll remove the metadata from the store. I believe this caused K8s namespace and deployment type information lost when previous pod exits #656, but I'm investigating a bit more.

Relates to #656

These error conditions are impossible to test reliably with integration tests, so I relied heavily in this PR on unit tests, simulating delayed information.

@grcevski grcevski requested a review from a team as a code owner September 19, 2025 22:15
Copy link

codecov bot commented Sep 19, 2025

Codecov Report

❌ Patch coverage is 66.80498% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.55%. Comparing base (a864452) to head (0dbfe27).
⚠️ Report is 21 commits behind head on main.

Files with missing lines Patch % Lines
pkg/transform/k8s.go 66.07% 33 Missing and 5 partials ⚠️
pkg/export/prom/prom.go 60.78% 17 Missing and 3 partials ⚠️
pkg/export/otel/pid_tracker.go 43.47% 13 Missing ⚠️
pkg/export/otel/metrics.go 70.37% 7 Missing and 1 partial ⚠️
pkg/components/kube/store.go 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #661      +/-   ##
==========================================
- Coverage   61.61%   61.55%   -0.06%     
==========================================
  Files         199      199              
  Lines       22111    22307     +196     
==========================================
+ Hits        13623    13732     +109     
- Misses       7521     7599      +78     
- Partials      967      976       +9     
Flag Coverage Δ
integration-test-arm 34.38% <11.61%> (-0.03%) ⬇️
k8s-integration-test-1916- ?
k8s-integration-test-1965- 50.11% <66.80%> (?)
oats-test 36.93% <7.46%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!!

@MrAlias MrAlias merged commit 679433c into open-telemetry:main Sep 23, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants