-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datalake/metrics: miscellaneous improvements to lag metrics #24568
Conversation
/dt |
Retry command for Build#59736please wait until all jobs are finished before running the slash command
|
CI test resultstest results on build#59736
test results on build#59747
test results on build#59767
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but deserves a release note
@@ -168,14 +180,19 @@ void replicated_partition_probe::setup_internal_metrics(const model::ntp& ntp) { | |||
{sm::shard_label, partition_label}); | |||
|
|||
if (model::is_user_topic(_partition.ntp())) { | |||
// Metrics are reported as follows | |||
// -2 (default initialized state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add this to the public description too for downstream systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Retry command for Build#59747please wait until all jobs are finished before running the slash command
|
This is based on our experience debugging Brandon's perf setup. - Changes the metric reporting to report lag only on leaders. This makes it easy to monitor the metric using an aggregate across all replicas without having to worry about the current leader. - Fixed a bug where lag entry was not added to serde fields, adjusted the test coverage to catch this scenario, refactored the test slightly while I'm there.
6b976a0
to
bef3254
Compare
Retry command for Build#59767please wait until all jobs are finished before running the slash command
|
/backport v24.3.x |
This is based on our experience debugging Brandon's perf setup.
Changes the metric reporting to report lag only on leaders. This makes it easy to monitor the metric using an aggregate across all replicas without having to worry about the current leader.
Fixed a bug where lag entry was not added to serde fields, adjusted the test coverage to catch this scenario, refactored the test slightly while I'm there.
Backports Required
Release Notes
Bug Fixes