query: Add metric to track healthy/unhealthy endpoints #8492

codesome · 2025-09-16T00:03:57Z

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Adds thanos_query_endpoints_count metric to track healthy and unhealthy endpoints.

Use case: when not using strict endpoints, the endpoints will silently go stale and there is no way of knowing if queries/rules are getting wrong/no results (wrong results because the data can be partial with stale endpoints). This metric can be useful to alert on if endpoints become unhealthy (for example load-balancers being down).

If there is some other way to identify such silently failing queries, this PR can be closed in favor of that.

Verification

TestEndpointSetUpdate was panicking locally even without this change, so I have not written unit tests for this yet.

Signed-off-by: Ganesh Vernekar <[email protected]>

Add metric to track healthy/unhealthy endpoints in query

59a5e5b

Signed-off-by: Ganesh Vernekar <[email protected]>

pull-request-size bot added the size/S label Sep 16, 2025

codesome added 2 commits September 15, 2025 17:26

Fix lint

cf2c305

Signed-off-by: Ganesh Vernekar <[email protected]>

more lint

58dace7

Signed-off-by: Ganesh Vernekar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

query: Add metric to track healthy/unhealthy endpoints #8492

query: Add metric to track healthy/unhealthy endpoints #8492

codesome commented Sep 16, 2025

Uh oh!

Uh oh!

query: Add metric to track healthy/unhealthy endpoints #8492

Are you sure you want to change the base?

query: Add metric to track healthy/unhealthy endpoints #8492

Conversation

codesome commented Sep 16, 2025

Changes

Verification

Uh oh!

Uh oh!