Honor the console_fallback setting for engine.observed_state.logging_events send policy#2914
Open
jmacd wants to merge 3 commits intoopen-telemetry:mainfrom
Open
Honor the console_fallback setting for engine.observed_state.logging_events send policy#2914jmacd wants to merge 3 commits intoopen-telemetry:mainfrom
jmacd wants to merge 3 commits intoopen-telemetry:mainfrom
Conversation
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (66.66%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2914 +/- ##
==========================================
- Coverage 86.26% 86.26% -0.01%
==========================================
Files 715 715
Lines 272060 272066 +6
==========================================
- Hits 234700 234698 -2
- Misses 36836 36844 +8
Partials 524 524
🚀 New features to boost your workflow:
|
…reporter
The InternalTelemetrySystem ITS reporter was hard-coded to use
SendPolicy::default(), which has console_fallback: true. Under load
this can produce hundreds of thousands of 'Channel full, dropping
observed event' raw_error! lines on stderr per second when the ITS
flume channel saturates, which:
- drowns out useful diagnostic output
- silently dominates /proc/<pid>/io wchar so any wire-bytes
measurement based on that counter is contaminated
- costs measurable CPU writing those messages
The engine config already has the right knob,
engine.observed_state.logging_events, with default
console_fallback: false for the logging path (only the engine_events
path defaults to true). It just was not being plumbed to the ITS
reporter.
Add an its_reporter_policy: SendPolicy parameter to
InternalTelemetrySystem::new and use it for both reporter
constructions in the ITS branch (with and without log tap). Pass
engine.observed_state.logging_events.clone() from the controller
startup. All test call sites updated to pass SendPolicy::default()
to preserve their existing behaviour.
Verified with cargo test -p otap-df-telemetry --lib (126 passed).
Verified end-to-end: a logger configured with global: its at 10k/s
and a downstream that backpressures (forcing channel-full drops at
~50%) now produces 0 'Channel full' lines vs the prior 150k lines
in 30s, and process CPU drops correspondingly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e66bafd to
1f142c8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Summary
console_fallback: falsewould not disable telemetry from logging calls that overflow, due to a default setting.
The correct field
engine.observed_state.logging_events.Fixes #2916