-
Notifications
You must be signed in to change notification settings - Fork 308
Rework SymbolizationComplete #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3541f2b
to
9991fa8
Compare
@@ -117,11 +117,8 @@ func newTraceHandler(rep reporter.TraceReporter, traceProcessor TraceProcessor, | |||
} | |||
|
|||
func (m *traceHandler) HandleTrace(bpfTrace *host.Trace) { | |||
defer m.traceProcessor.SymbolizationComplete(bpfTrace.KTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplifying, SymbolizationComplete
is now called from tracer/events.go
with an introduced upper bound to the calling frequency.
pollFrequency time.Duration, perCPUBufferSize int, triggerFunc func([]byte, int), | ||
) func() (lost, noData, readError uint64) { | ||
eventReader, err := perf.NewReader(perfEventMap, perCPUBufferSize) | ||
func (t *Tracer) startTraceEventMonitor(ctx context.Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point in having this be a generic function when:
- Nothing else is currently using it (other than receiving trace events)
- I'm introducing logic that's specialized to trace event handling
Instead, switching it to a Tracer
method also simplifies the interface.
This is how the logic looks now (polling loop and On a system with low CPU load
We see 4 iterations of the polling loop per second (as expected due to 250ms polling interval) and On a fully loaded system
Again we see 4 iterations of the polling loop per second, but this time |
9991fa8
to
870479d
Compare
tracer/events.go
Outdated
kt := oldKTime | ||
if minKTime > 0 && minKTime < kt { | ||
// If current minKTime is smaller than oldKTime, use it | ||
// instead of oldKTime (and set it to 0 to avoid a repeat). | ||
kt = minKTime | ||
minKTime = 0 | ||
} | ||
t.TraceProcessor().SymbolizationComplete(kt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a kt
temp variable, the code is easier to read.
kt := oldKTime | |
if minKTime > 0 && minKTime < kt { | |
// If current minKTime is smaller than oldKTime, use it | |
// instead of oldKTime (and set it to 0 to avoid a repeat). | |
kt = minKTime | |
minKTime = 0 | |
} | |
t.TraceProcessor().SymbolizationComplete(kt) | |
if oldKTime <= minKTime { | |
t.TraceProcessor().SymbolizationComplete(oldKTime) | |
} else { | |
// If minKTime is smaller than oldKTime, use it | |
// and reset it to avoid a repeat. | |
t.TraceProcessor().SymbolizationComplete(minKTime) | |
minKTime = 0 | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to rework this a bit as the suggested logic was incorrect when minKTime == 0
. See d12a80a.
c7a47e4
to
ad771d9
Compare
5dc9499
to
0009388
Compare
if minKTime == 0 || trace.KTime < minKTime { | ||
minKTime = trace.KTime | ||
} | ||
traceOutChan <- trace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you moved that code here it becomes obvious that we enforce a task/context switch for every trace. Possibly not for this PR, but we should improve this (e.g. with batch processing or a buffered channel).
Update SymbolizationComplete mechanism to reflect current semantics around trace processing and timestamping (no batching, in-kernel high resolution timestamps)
Co-authored-by: Tim Rühsen <[email protected]>
51f78ac
to
eded6cb
Compare
Sync from upstream (2025-03-12) Florian Lehner <[email protected]> symblib: expose API for single point lookups (open-telemetry#380) Co-authored-by: GitHub <[email protected]> Tolya Korniltsev <[email protected]> chore: remove unused controller.Config fields (open-telemetry#387) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> libpf: drop unused code (open-telemetry#386) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> tracehandler: drop metadataWarnInhib (open-telemetry#385) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> Go: update to go.opentelemetry.io/[email protected] (open-telemetry#383) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> processmanager: Don't synchronize a process that's waiting cleanup (open-telemetry#379) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> CI: use latest LTS kernel in tests (open-telemetry#382) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> Makefile: add cargo clean to target clean (open-telemetry#381) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> Switch semantics for process.executable.name (open-telemetry#306) Co-authored-by: GitHub <[email protected]> Tim Rühsen <[email protected]> Stabilize CI / integration tests (open-telemetry#378) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> Docker fixup (open-telemetry#375) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> Docker: fix rust set up (open-telemetry#371) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> tracer: attach to all kprobes with prefix for off CPU profiling (open-telemetry#370) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> Go: update to Go 1.23 (open-telemetry#372) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> support: generate *ProcInfo types with cgo (open-telemetry#367) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> process: reuse and preallocate memory (open-telemetry#355) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> rust: preparations to integrate Rust (open-telemetry#360) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> Switch to OTel metrics (open-telemetry#348) Co-authored-by: GitHub <[email protected]> Tolya Korniltsev <[email protected]> cargo: remove unused workspace dependency declarations (open-telemetry#364) Co-authored-by: GitHub <[email protected]> Tolya Korniltsev <[email protected]> reporter: add custom gRPC dial options (open-telemetry#363) Co-authored-by: GitHub <[email protected]> umanwizard <[email protected]> Various fixes to node/V8 (open-telemetry#333) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> doc: fix path of tooling (open-telemetry#361) Co-authored-by: GitHub <[email protected]> OpenTelemetry Bot <[email protected]> Add FOSSA scanning workflow (open-telemetry#357) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> rust: use macro for debug output (open-telemetry#356) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> symblib/gosym: add single point lookup (open-telemetry#346) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> README: provide devfiler v0.14.0 (open-telemetry#354) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> CI: skip environment setup (open-telemetry#353) Co-authored-by: GitHub <[email protected]> Richard Chukwu <[email protected]> Improve contributor guide (open-telemetry#349) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> Fix build (open-telemetry#350) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> processinfo: refactor process metadata (open-telemetry#344) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> reporter/pdata: do no generate profiles if there are no events (open-telemetry#347) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> README: provide devfiler v0.13.0 (open-telemetry#343) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> processmanager: Fix process exit regression (open-telemetry#337) (open-telemetry#338) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> libpf: drop Hash64 (open-telemetry#340) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> cargo: set license field (open-telemetry#336) Co-authored-by: GitHub <[email protected]> Damien Mathieu <[email protected]> Use dummy support for any non-arm64 and non-amd64 archs (open-telemetry#335) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> rust: drop anyhow dependency (open-telemetry#334) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> support: use cgo to generate Go constants from eBPF (open-telemetry#332) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> processmanager: Don't log inside critical areas (open-telemetry#328) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> CI: add test for Rust components (open-telemetry#326) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> processmanager: simplify API and return early (open-telemetry#325) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> Add Rust native symbolization library and C API wrapper (open-telemetry#267) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> Metrics for trace event perf event monitor (open-telemetry#322) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> Delayed processing for ProcessManager.pidToProcessInfo (open-telemetry#321) Co-authored-by: GitHub <[email protected]> Christos Kalkanis <[email protected]> Rework SymbolizationComplete (open-telemetry#307) Co-authored-by: GitHub <[email protected]> Tim Rühsen <[email protected]> Amend -off-cpu-threshold value (open-telemetry#316) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> reporter/collector: fix reporting issue (open-telemetry#319) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> reporter: move pkg samples from internal to public (open-telemetry#314) Co-authored-by: GitHub <[email protected]> Florian Lehner <[email protected]> README: provide devfiler v0.11.0 (open-telemetry#313) Co-authored-by: GitHub <[email protected]>
Summary
Updated
SymbolizationComplete
mechanism to reflect current semantics around trace processing and timestamping (no batching, in-kernel high resolution timestamps):SymbolizationComplete
per-Trace, instead call it after each iteration of the perf event batch-drain loop. This introduces a call frequency upper bound (currently: 4Hz).KTime
seen during trace event retrieval and report the minimumKTime
belonging to the previous processing iteration withSymbolizationComplete
.startPollingPerfEventMonitor
is now specialized to trace event processing, this also simplifies caller logic.TODO:
GenerifyWill open new PR for this.SymbolizationComplete
, fix Sending executable path for processes that have exited #278Also see: