Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pkg/bpf): Use channel to process events #1671

Merged

Conversation

dave-tucker
Copy link
Collaborator

@dave-tucker dave-tucker commented Aug 5, 2024

Processing events in the same goroutine as the ring buffer reader requires acquiring a mutex, which blocks ringbuf event processing causing a backlog. To avoid this, send events via a buffered channel to a dedicated event processing goroutine to ensure that the ringbuf remains unblocked. This has decreased CPU load from 1-3% on my machine to 0-1% CPU load.

Additionally adds some metrics to track events read, events processed and the current depth of the events channel.

Updates: #1670, #1660

Processing events in the same goroutine as the ring buffer reader
requires acquiring a mutex, which blocks ringbuf event processing
causing a backlog. To avoid this, send events via a buffered
channel to a dedicated event processing goroutine to ensure
that the ringbuf remains unblocked. This has decreased CPU
load from 1-3% on my machine to 0-1% CPU load.

Signed-off-by: Dave Tucker <[email protected]>
@dave-tucker
Copy link
Collaborator Author

@vimalk78 @rootfs if you'd like to test this one to see if see a difference on your systems.
Running with -bpf-debug-metrics will also help me potentially confirm/deny suspicions on what's causing the CPU usage and help inform any additional work on #1660

Copy link

github-actions bot commented Aug 5, 2024

🤖 SeineSailor

Here's a concise summary of the pull request changes:

Summary: The pull request "fix(pkg/bpf): Use channel to process events" introduces a significant change to the event handling mechanism in the pkg/bpf package. Key modifications include:

  • Handling events using a buffered channel and a dedicated goroutine to avoid blocking the ring buffer, reducing CPU load.
  • Adding metrics for tracking events read, processed, and the current depth of the events channel.
  • Introducing a new flag bpfDebugMetricsEnabled to control debug metrics for eBPF.
  • Adding the RegisterMetrics function to the Exporter interface, with a corresponding implementation in the package.

Impact: This change affects the package's external interface, requiring integrators to call RegisterMetrics during setup. The addition of a new dependency on the Prometheus client library for metrics may have implications for the project's dependencies and compatibility.

Observations/Suggestions:

  • The change improves the performance and efficiency of event handling, which is a significant benefit.
  • The introduction of metrics and a new flag provides better visibility and control over the event processing pipeline.
  • It would be beneficial to include documentation updates to reflect the changes to the Exporter interface and the new dependency on the Prometheus client library.
  • Consider adding tests to ensure the correct functioning of the new event handling mechanism and metrics.

@rootfs
Copy link
Contributor

rootfs commented Aug 5, 2024

let's merge this so we run this on the CI

@rootfs rootfs merged commit c99e399 into sustainable-computing-io:main Aug 5, 2024
19 of 20 checks passed
dave-tucker added a commit to dave-tucker/kepler that referenced this pull request Aug 6, 2024
dave-tucker added a commit to dave-tucker/kepler that referenced this pull request Aug 7, 2024
dave-tucker added a commit to dave-tucker/kepler that referenced this pull request Aug 9, 2024
…#1671)

Processing events in the same goroutine as the ring buffer reader
requires acquiring a mutex, which blocks ringbuf event processing
causing a backlog. To avoid this, send events via a buffered
channel to a dedicated event processing goroutine to ensure
that the ringbuf remains unblocked. This has decreased CPU
load from 1-3% on my machine to 0-1% CPU load.

Signed-off-by: Dave Tucker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants