-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: limit scrape_duration_seconds visualization to CHT instance #73
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @IanMinash! I agree that it would be helpful to limit the scrape metrics to the current CHT instance in context. Unfortunately, your change would hide not only the metrics for other CHT instances, but also the scrape metrics from the rest of the monitoring stack (exporters, Grafana, etc) and, most importantly, would hide Postgres scrape metrics (for anyone who has connected Watchdog to their Postgres instance).
I have suggested a change that would keep the other non-CHT metrics in view while not displaying metrics for the out-of-context CHT instances. Additionally, it will filter the postgres metrics to only show the one associated with the current CHT instance in context.
Can you try these changes and see if they are acceptable? Thanks!
@@ -2052,7 +2052,7 @@ | |||
"uid": "PBFA97CFB590B2093" | |||
}, | |||
"editorMode": "code", | |||
"expr": "scrape_duration_seconds", | |||
"expr": "scrape_duration_seconds{instance=\"$cht_instance\"}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"expr": "scrape_duration_seconds{instance=\"$cht_instance\"}", | |
"expr": "scrape_duration_seconds unless (scrape_duration_seconds{job=\"cht\", instance!=\"$cht_instance\"} or scrape_duration_seconds{job=\"postgres\", cht_instance!=\"$cht_instance\"})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jkuester, in our case, the monitoring stack is used beyond CHT and the changes suggested still include targets unrelated to CHT. It is also not quite clear to me why I should be interested in scrape duration for grafana
for example, whilst I'm interested in processes/metrics that directly relate to CHT.
In my opinion, we should be explicit with the jobs we intend to target then exclude out-of-context instances though I'm still keen on understanding your POV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm okay, I think I see where you are coming from here. This whole Monitoring Stack
row was originally intended to be just an at-a-glance health-check for the Watchdog processes themselves (so Grafana, Prometheus, and the exporters). That being said, I understand the value in being able to easily check the scrape history for the CHT instance in context. It is almost like we are technically trying to achieve different goals:
- See overall health of Prometheus by seeing the historical trends for the for the scrape times of all targets. This lets you easily identify outliers or understand when a target is not getting scraped as expected.
- See the historical scrape performance for the particular CHT instance in context (informing, among other things, the freshness of the metrics for that instance).
Here is my proposal: lets have separate panels to address both of there goals! In this PR, we can leave the panel in the Monitoring Stack
row as it was and instead add a new panel on the CHT Admin Details dashboard (maybe up at the top outside of all the existing rows?) that just shows the scrape data for the CHT instance in context (along with the Postgres instance for that instance if it exists). Then, I have logged #78 which can be addressed later to move the whole Monitoring Stack
row out of the CHT Admin Overview dashboard.
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the Monitoring Stack
row should be moved out of the CHT Admin dashboard.
Limited the values of the Scrape Duration graph to only come from the currently selected CHT instance(
$cht_instance
). This prevents non-related VMs from showing up in a dashboard thats only focused on CHT monitoring.