operator: multicluster end-to-end observability — raft metrics, reconcile-health, StretchCluster member status, PrometheusRule, dashboard#1509
Open
hidalgopl wants to merge 8 commits into
Conversation
fce0d26 to
fc0d602
Compare
fc0d602 to
32963a4
Compare
83f91a2 to
55fee90
Compare
…ter member status
Three slices of operator observability that flow into a single PrometheusRule and a single Grafana dashboard.
**Multicluster raft metrics.** The raft layer was operationally invisible — only structured logs and a `peer.dropCount` atomic surfaced via a periodic logger goroutine. New `operator_multicluster_raft_*` family registered to controller-runtime's metrics registry: `leader_changes_total`, `messages_{sent,received}_total{msg_type,peer}`, `send_errors_total{peer,error_type}` (closed six-value vocabulary: timeout/canceled/unavailable/auth/marshal/other), `messages_dropped_total{peer}`, `send_duration_seconds{peer,result}` (cross-region buckets 1ms..2.5s), `inflight_rpcs{peer}`, `peer_reachable{peer}`, `unreachable_reports_total{peer}`, `snapshots_sent_total{peer}`, `snapshot_send_errors_total{peer}`, and a leader-only `follower_match_lag_entries{peer}` (reads `node.Status().Progress`; followers keep prior value, federation expectation is "scrape from the leader"). Transport-backed gauges read existing atomics on scrape via `RegisterTransport(t)`: `term`, `state{state="leader|follower|candidate|pre_candidate|unknown"}` (one series per state with 0/1 value, no separate `is_leader` gauge), `send_queue_length{peer}`. `runDropLogger` and the `peer.dropCount` atomic are deleted — `messages_dropped_total{peer}` plus standard alerting covers it. `RegisterTransport` unregisters any prior transport collector before registering; safe no-op in prod (one transport per process), fixes test-ordering in `setupLockTest`.
**Reconcile-health metrics.** Every controller already emits the controller-runtime built-ins but there are no signals tuned for self-triggered loops, falling behind on spec, or non-determinism in spec-rendering. New `operator/internal/observability/` package adds `Wrap[R](inner reconcile.TypedReconciler[R], controller string)` middleware that emits `operator_controller_reconcile_steady_state_total{controller}` (incremented when the inner returned `(Result{}, nil)` — a controller whose `reconcile_total` rate is high but `steady_state` rate is flat is spinning) and `operator_controller_reconcile_requeue_after_seconds{controller}` (histogram of `Result.RequeueAfter`; tight cluster of sub-second values = retry loop). Generic over `reconcile.TypedReconciler[R]` so the same wrapper covers both `ctrl.Reconciler` and the multicluster reconciler. Two passive recorder helpers for per-object signals that need an object reference: `RecordObservedGeneration(controller, kind, gen, obsGen)` → `operator_controller_reconcile_observed_generation_drift` (clamps negative deltas to zero), and `RecordSpecHashChangedWithoutGeneration(controller, kind)` → `operator_controller_reconcile_spec_hash_changed_without_generation_total` (canonical non-determinism signal). Both leave it to the calling controller to decide when to record so the observability layer never duplicates API reads. Three controllers wrap at `SetupWithManager`: v2 Redpanda, v2 NodePool, and the multicluster StretchCluster reconciler. Other controllers (Console, vectorized v1, decommissioners, PVCUnbinder, NodeWatcher) keep their built-ins un-wrapped — they don't manage the resources in scope.
**StretchCluster member-status metrics.** Where `operator_controller_*` describes how the controllers behave, these describe what they're managing. All gauges, bounded label cardinality (`stretchcluster`, `member`): `operator_stretchcluster_member_reachable` (0/1 from the multicluster manager's reachability probe; local cluster always 1, recorded under its canonical name via `lifecycle.CanonicalClusterName` rather than the multicluster-runtime's empty-string sentinel); `operator_stretchcluster_brokers` / `operator_stretchcluster_brokers_ready` (desired and ready broker counts per member, summed across NodePools pointing at that member — gap on a single member = partial outage); `operator_stretchcluster_replication_health{stretchcluster}` (0/1, cluster-wide from the admin API check `reconcileDecommission` already makes, recorded right after the call returns); `operator_stretchcluster_spec_drift{stretchcluster, member}` (0/1, does each member's local StretchCluster.spec match the operator's view; set inside the existing `checkSpecConsistency` routine). No new API calls — passive recorder, callers pass values they already have. `MulticlusterReconciler` is the only consumer; instrumented at three sites where the data already lives (`checkSpecConsistency`, `reconcileDecommission`, and a new `recordBrokerCountMetrics` helper called once per reconcile after `fetchInitialState`). Unreachable members are recorded as `member_reachable=0` but `spec_drift` keeps its prior value because we genuinely don't know.
**Chart artifacts.** `operator/chart/prometheusrule.go` (transpiled to `_prometheusrule.go.tpl` by gotohelm) emits a PrometheusRule with recording rules and two alert groups — reconcile health (`OperatorReconcileErrors`, `OperatorReconcileRunaway`, `OperatorReconcileStalled`, `OperatorWorkerPoolSaturated`, `OperatorObservedGenerationDrift`, `OperatorNonDeterministicSpec`) and StretchCluster (`StretchClusterMemberUnreachable` 2m, `StretchClusterBrokerCountSkew` 10m, `StretchClusterSpecDrift` 5m, `StretchClusterReplicationUnhealthy` 5m). All severity `warning` — indicators that need eyes, not page-now incidents. New `values.monitoring.rulesEnabled` (default `false`) — sibling of `monitoring.enabled` (ServiceMonitor); independent so consumers can opt into rules without the ServiceMonitor. Chart test case `monitoring-rules-enabled` locks the output into the golden file. `docs/operator-grafana-dashboard.json` rewritten as a single comprehensive dashboard: 29 panels across 5 rows covering reconcile health, StretchCluster member status, and multicluster raft. `docs/operator-metrics.md` is the canonical inventory restructured into four groups (controller-runtime built-ins / reconcile-health / resource-state / multicluster raft); cardinality table up top lists every label and its closed vocabulary; explicit "PLANNED (not yet emitted)" subsection lists `self_triggered_total` / `time_since_last_success_seconds` so downstream dashboards don't break later.
**Design notes.** Plain `prometheus/client_golang`, not OTel — matches the existing pattern in `operator/internal/controller/vectorized/metric_controller.go`, `operator/cmd/version/version.go`, `operator/pkg/client/kgo_hooks.go`. `leader_id` is deliberately not a numeric gauge — `sum(leader_id)` is meaningless and `state{state="leader"}` already identifies the leader on each peer. `leader_changes_total` is incremented in the Ready loop on `leader != prevLeader && leader != 0`; scraping from the leader gives the cluster-wide leader-change count. `error_type` bucketing keeps cardinality bounded and each bucket maps to a different on-call story.
Tests cover `msgTypeLabel` and `normaliseRaftState` exhaustively, plus an integration test that brings up a 3-node cluster via the existing `setupLockTest` harness and asserts gauges reflect elected state and heartbeat counters accrue from natural traffic. `operator/internal/observability/wrapper_test.go` covers the steady-state counter, requeue-after histogram, observed-generation drift clamping, and the spec-hash counter. Chart golden file regenerated for `monitoring-rules-enabled`.
…ntegration test Adds the two reconcile-health metrics that the existing observability inventory documented as PLANNED / Reserved: * operator_controller_reconcile_last_success_timestamp_seconds (gauge, wrapper-emitted). Set inside the existing steady-state branch via a one-line write of the current Unix timestamp. Bounded cardinality (controller label only). Prometheus computes the user-facing "seconds since last success" as time() - this_gauge — no goroutine bookkeeping or oldest-unfinished tracker required. * operator_controller_reconcile_self_triggered_total (counter, opt-in via observability.RecordSelfTriggered). Increments when a controller has detected that its own write to an object will re-enqueue the same reconcile without any other observable effect. The wrapper deliberately does not increment this — that would require a redundant Get to hash before/after every reconcile, breaking the wrapper's "passive, no extra reads" design. Controllers opt in from their own write helpers where the pre/post-write state is already in hand. The wrapper exposes nowUnix as a var so deterministic tests can drive the gauge without clock skew. Dashboard: the existing "Self-triggered reconciles" stat panel drops its "reserved for future / renders N/A" language and gains a real description pointing at RecordSelfTriggered. A new full-width timeseries panel renders time() - last_success_timestamp_seconds per controller as seconds-since-last-success — climbing past natural re-queue intervals means the controller is failing or spinning. Tests: * Unit tests cover the new wrapper gauge (advances on steady state, frozen on error and on RequeueAfter), and the RecordSelfTriggered helper. * TestIntegrationObservabilityInfiniteReconcile drives the wrapper through a real controller-runtime Manager against an envtest apiserver. A test reconciler watches ConfigMaps and switches between spinning (RequeueAfter 100ms) and steady-state via an atomic. Asserts that the spinning phase keeps last_success_timestamp_seconds and steady_state_total at 0 while the requeue histogram fills, then validates recovery once the mode flips. testutil.SkipIfNotIntegration gates it behind -tags integration. Docs: operator-metrics.md moves both metrics out of the "Reserved (currently silent)" section into the live wrapper / recorder tables. The Reserved section is now gone. Changie entry under operator-Added-*. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First opt-in caller of observability.RecordSelfTriggered. The MulticlusterReconciler's syncStatus loop pushes the local StretchCluster.Status to every reachable peer via Status().Update(). When the remote's existing status is semantically identical to what we're about to write, the Update still bumps resourceVersion and re-enqueues the StretchCluster reconciler on that peer — the canonical infinite-reconcile shape. apiequality.Semantic.DeepEqual is already imported and used elsewhere in this file for the spec-drift check; reuse it here for the status-equality probe. Recorded after a successful Update so transient write errors don't pollute the counter. The metric powers the dashboard's "Self-triggered reconciles" stat panel — sustained non-zero rate is an indicator that the syncStatus path should diff before writing rather than writing unconditionally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stogram
The wrapper's previous steady-state predicate only matched
(Result{}, nil) — a clean "no work" return. But the
MulticlusterReconciler always returns RequeueAfter =
periodicRequeue via a defer (the canonical "wake me up
periodically" pattern), so it never entered the steady-state
branch. Result: last_success_timestamp_seconds and
steady_state_total stayed permanently empty for that
controller despite it being healthy.
Wrap now takes a third argument: defaultRequeueTimeout. The
record() branch is now:
err == nil && (result.IsZero() || isPeriodicRequeue(result))
Both shapes count as steady state. isPeriodicRequeue returns
true when result.RequeueAfter exactly equals
defaultRequeueTimeout (and false when defaultRequeueTimeout
== 0, so a stray Result{Requeue: true} on a non-periodic
controller doesn't accidentally register).
The requeue-after histogram now skips the periodic value —
otherwise the periodic-wake samples would dominate every
bucket and bury the tight-retry-loop signal the histogram
exists to surface.
Three call sites updated to pass the right periodic value:
* redpanda_controller.go:157 → periodicRequeue
* nodepool_controller.go:155 → periodicRequeue
* multicluster_controller.go → defaultReconcileTimeout
Tests cover all four paths:
* Result{} on a controller with defaultRequeueTimeout=0 → steady
* Result{RequeueAfter: defaultRequeueTimeout} → steady
* Result{RequeueAfter: other} → not steady, observed in histogram
* Result{RequeueAfter: defaultRequeueTimeout} → NOT observed
in histogram
Docs and recorder.go godoc updated to reflect the dual-shape
steady-state definition.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A critical second pass on the operator_controller_* family
revealed three metrics that were defined and exported but had
zero call sites: passive opt-in helpers waiting for a
controller to wire them, which never happened. Same shape as
the self_triggered_total metric that was removed in an earlier
pass.
Removed metrics (with their helpers, tests, alerts, and
dashboard panels):
* operator_controller_reconcile_self_triggered_total
(was wired to a syncStatus deep-equal probe in
multicluster_controller.go — that wiring is reverted here
too; the canonical "spinning loop" pattern is already
detected by rate(reconcile_total) > rate(steady_state_total)
and the OperatorReconcileRunaway alert).
* operator_controller_reconcile_observed_generation_drift
(gauge + RecordObservedGeneration helper; alert
OperatorObservedGenerationDrift; dashboard panel 21).
Generation drift is detectable on a per-resource basis
from status.observedGeneration directly if a future use
case requires it.
* operator_controller_reconcile_spec_hash_changed_without_generation_total
(counter + RecordSpecHashChangedWithoutGeneration helper;
alert OperatorNonDeterministicSpec; dashboard panel 22).
A passive opt-in helper with zero call sites is dead code.
Net surface of the operator_controller_* family is now three
wrapper-emitted metrics — no opt-in helpers, no dead code:
* reconcile_steady_state_total (counter)
* reconcile_requeue_after_seconds (histogram)
* reconcile_last_success_timestamp_seconds (gauge)
While here:
* The Wrap call in multicluster_controller.go was passing
defaultReconcileTimeout (2m, the per-reconcile context
deadline) instead of periodicRequeue (3m, what the
reconciler actually returns). That meant the wrapper's
isPeriodicRequeue predicate never matched and
last_success_timestamp_seconds stayed empty for the
StretchCluster controller. Fixed.
* Dashboard panels 20 and 24 expanded to full width to fill
the gaps left by deleted panels 21 / 22. Version bumped
to 6.
* Docs (operator-metrics.md), changelog entries, and the
chart's prometheusrule.go all reflect the reduced surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Metric definitions were spread across three files in the operator
module:
* operator/internal/observability/recorder.go (3 operator_controller_*)
* operator/internal/observability/stretch_recorder.go (5 operator_stretchcluster_*)
* operator/internal/controller/redpanda/metric_controller.go (4 v2 redpanda_*)
To answer "what metrics does this operator expose?" you had to
grep three files. Consolidating: every prometheus.New* call now
lives in operator/internal/observability/metrics.go, grouped
into three sections (reconcile-health, StretchCluster, Redpanda
v2 CR resource-state) with a single init() that registers all
12 metrics. The redpanda metric_controller.go reconciler imports
the exported vars from the observability package instead of
defining its own.
The v2 metric vars are now exported (Redpandas,
RedpandaDesiredNodes, RedpandaReadyNodes,
RedpandaMisconfiguredClusters) instead of package-private.
recorder.go is reduced to the package docstring (kept as the
canonical place for the package-level godoc).
stretch_recorder.go is reduced to the four RecordStretchCluster*
helper functions.
Out of scope:
* v1 (vectorized.redpanda.com Cluster) metrics — that
controller is legacy and explicitly excluded from
unrelated changes. Its 4 metrics remain defined next to
its reconciler.
* Multicluster raft metrics in pkg/multicluster/leaderelection/
metrics.go — different Go module; cross-module consolidation
would either expose operator/internal/observability/ to all
pkg/ consumers or pull controller-runtime deps into the pkg
module.
No emitted metric names change. No alerts or dashboards affected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The reconcile_requeue_after_seconds histogram was a triage-aid metric distinguishing self-requeue churn from external-event- driven churn, but in practice the primary spinning detector (rate(reconcile_total) > 5 while rate(steady_state_total) == 0, fired by OperatorReconcileRunaway) is sufficient and the histogram's added diagnostic value rarely justified the maintenance cost — 11 bucket counters per controller plus a wrapper-side write per non-zero non-periodic RequeueAfter return. The wrapper retains its isPeriodicRequeue predicate because the steady-state branch still needs it to recognise periodic-wake returns as steady. The dashboard "Reconcile-health signals" section now contains a single full-width panel (time() - last_success_timestamp_seconds). Net surface of operator_controller_* after this drop: * reconcile_steady_state_total (counter) * reconcile_last_success_timestamp_seconds (gauge) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s, changie
Five small follow-up changes from a final review pass over the
observability work:
1. operator/internal/controller/redpanda/nodepool_controller.go —
wrap NodePool with observability.Wrap in SetupWithMultiClusterManager.
The multicluster binary (cmd/multicluster) calls this path; only the
single-cluster binary's SetupWithManager was wrapping. Result:
NodePool emitted controller-runtime built-in metrics in the
multicluster binary but nothing from the operator_controller_*
family — its dashboard panels were permanently empty. Found via
dogfooding the dashboard against the dev env.
2. docs/operator-grafana-dashboard.json — fix two leader-only panel
queries ("Send latency p99 (per peer, leader's view)" and "Follower
match-lag entries (leader's view)"). The old form used
`sum by (le, peer) (...) * on(instance) group_left() (leader==1)`,
which drops `instance` in the aggregation and then tries to join on
it. Replaced with `and ignoring(le, peer, result, state)
(leader==1)` — set-intersection that joins on every shared identity
label, works in both dev (with the dev-env `vcluster` label) and
prod (direct scrape with just `instance`).
3. pkg/multicluster/leaderelection/metrics.go — compile-time assertion
`var _ prometheus.Collector = &transportCollector{}` so a missing
or signature-drifted Describe / Collect method fails the build
instead of failing at runtime registration.
4. docs/operator-metrics.md — clean up stale references left over
from the metric audit (mentions of generation drift /
non-determinism in the Group 2 intro, the orphaned `kind` label in
the cardinality table, the opt-in Record* helpers that no longer
exist, and the dashboard cross-link framing).
5. .changes/unreleased/ — replace three incremental Added entries
with one consolidated entry covering the whole observability story.
The previous incremental entries described slices in commit-time
order and the raft family (the original subject of this PR) had no
dedicated entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
55fee90 to
f0c05ba
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ships first-class observability for the multicluster operator. One PR, three metric families plus chart/dashboard/docs.
What's in
1. Multicluster raft metrics (
operator_multicluster_raft_*)The raft layer powering cross-cluster leader election was operationally invisible — only structured logs and a
peer.dropCountatomic surfaced via a periodic logger goroutine. Diagnosing flapping leaders, slow peers, or chronic drops required eyeballing per-pod logs across all peers. New metrics, registered to controller-runtime's metrics registry so they ship out the operator's existing/metricsendpoint:Push-based (incremented at the event site):
leader_changes_total(Counter)messages_sent_total{msg_type, peer}/messages_received_total{msg_type, peer}—msg_typeis a closed vocabulary of ~10 raft message typessend_errors_total{peer, error_type}—error_typebucketed totimeout/canceled/unavailable/auth/marshal/othermessages_dropped_total{peer}— replaces the deletedrunDropLoggergoroutine +peer.dropCountatomicsend_duration_seconds{peer, result}— histogram with cross-region buckets (1ms..2.5s)inflight_rpcs{peer}/peer_reachable{peer}(Gauges)unreachable_reports_total{peer}/snapshots_sent_total{peer}/snapshot_send_errors_total{peer}follower_match_lag_entries{peer}— leader-onlyPull-based via
prometheus.Collector(read from transport atomics on scrape, no hot-path writes):term(Gauge)state{state=...}— one series per state with 0/1 value, no separateis_leadergaugesend_queue_length{peer}(Gauge)The collector lives in
pkg/multicluster/leaderelection/metrics.go, registered idempotently viaRegisterTransport(t)after the transport's atomics are populated. Compile-time assertion (var _ prometheus.Collector = &transportCollector{}) ensures the interface stays implemented.2. Reconcile-health metrics (
operator_controller_*)Wrapper-emitted automatically for every controller registered through
observability.Wrap(reconciler, controller, defaultRequeueTimeout):reconcile_steady_state_total{controller}— Counter, incremented when a reconcile returns "no work to do" — either(Result{}, nil)or(Result{RequeueAfter: defaultRequeueTimeout}, nil)matching the controller's configured periodic-requeue interval. The second shape is required becauseMulticlusterReconcileralways returnsRequeueAfter = periodicRequeuevia a defer; without the dual-shape predicate the StretchCluster controller would never register as steady.reconcile_last_success_timestamp_seconds{controller}— Gauge, Unix timestamp of the most recent steady-state reconcile. Prometheus computes "seconds since last success" at query time astime() - last_success_timestamp_seconds. Avoids the goroutine bookkeeping an imperative "seconds elapsed" gauge would need.Wrapis generic overreconcile.TypedReconciler[R]so it covers bothctrl.Reconciler(single-cluster) and the multicluster reconciler (mcreconcile.Request) without duplicating the body.Three controllers wrap: v2 Redpanda, NodePool, and StretchCluster. Two setup paths per controller —
SetupWithManager(single-cluster binary,cmd/run) andSetupWithMultiClusterManager(multicluster binary,cmd/multicluster). Both paths wrap.3. StretchCluster member-status metrics (
operator_stretchcluster_*)Per-member gauges, bounded cardinality (
stretchcluster,member):member_reachable— 0/1 from the multicluster manager's reachability probe. Local cluster is always 1, recorded under its canonical name vialifecycle.CanonicalClusterName.brokers/brokers_ready— desired and ready broker counts per member, summed across NodePools pointing at that member.brokers - brokers_ready > 0= partial outage.replication_health{stretchcluster}— 0/1, cluster-wide from the admin API health checkreconcileDecommissionalready runs.spec_drift{stretchcluster, member}— 0/1, does each member's localStretchCluster.specmatch the operator's view. Set insidecheckSpecConsistency.No new API calls — passive recorder, callers pass values they already have.
MulticlusterReconcileris the only consumer; instrumented at three sites where the data already lives.4. PrometheusRule (gated by
monitoring.rulesEnabledchart value)operator/chart/prometheusrule.go(transpiled to_prometheusrule.go.tplby gotohelm). Newmonitoring.rulesEnabledis a sibling ofmonitoring.enabled(ServiceMonitor); independent so consumers can opt into rules without the ServiceMonitor.Recording rules:
operator:reconcile_rate:5m,operator:reconcile_error_rate:5m,operator:reconcile_steady_state_rate:5m,operator:reconcile_p99_seconds:5m.Alerts, all
severity=warning:OperatorReconcileErrors—operator:reconcile_error_rate:5m > 0.1for 5mOperatorReconcileRunaway—operator:reconcile_rate:5m > 5for 5m (the canonical "spinning controller" signal — cross-checkssteady_state_total)OperatorReconcileStalled— active in the past hour but reconcile rate == 0 for 10mOperatorWorkerPoolSaturated—active_workers >= max_concurrent_reconcilesfor 10mStretchClusterMemberUnreachable— 2mStretchClusterBrokerCountSkew— 10mStretchClusterSpecDrift— 5mStretchClusterReplicationUnhealthy— 5m5. Comprehensive Grafana dashboard (
docs/operator-grafana-dashboard.json)Single comprehensive dashboard, 5 rows: multicluster raft, StretchCluster member status, reconcile activity, queues & workers, reconcile-health signals.
Leader-only panels (
Send latency p99,Follower match-lag entries) useand ignoring(...)to filter to the current leader's perspective — works in both dev-env (where remote_write adds avclusterlabel) and prod (direct scrape with justinstance).6. Single source of truth for metric definitions
Every
prometheus.New*call in the operator module now lives inoperator/internal/observability/metrics.go(the v1 vectorized Cluster metrics inoperator/internal/controller/vectorized/metric_controller.goare out of scope — v1 is legacy and explicitly not touched). Recorder helpers (RecordStretchCluster*) stay instretch_recorder.go. The raft family stays inpkg/multicluster/leaderelection/metrics.gobecause that's a different Go module.7. Documentation (
docs/operator-metrics.md)Canonical inventory of every metric the operator exposes. Cardinality table up front lists every label and its bounded vocabulary. Four groups: controller-runtime built-ins, reconcile-health, resource-state, multicluster raft.
8. Testing
wrapper_test.go): all four record-path branches —Result{}on a controller withdefaultRequeueTimeout=0,Result{RequeueAfter: defaultRequeueTimeout}(periodic-steady),Result{RequeueAfter: other}(real requeue, not steady), errors, immediate-requeue. Plus passthrough.integration_test.go):TestIntegrationObservabilityInfiniteReconcileruns a synthetic reconciler inside a real controller-runtime Manager driven by envtest. Switches between spinning (RequeueAfter 100ms) and steady-state mid-test; asserts metrics react correctly. Gated bytestutil.SkipIfNotIntegration+-tags integration.Design notes worth flagging for reviewers
leader_idis deliberately not a numeric gauge —sum(leader_id)is meaningless andstate{state="leader"} == 1already identifies the leader on each peer.statemodelled as one series per state value (leader|follower|candidate|pre_candidate|unknown) with 0/1, not as a label-string-as-value.sum(state) == 1invariant;state{state="leader"} == 1is the standard leader filter.isPeriodicRequeuereturnsfalsewhendefaultRequeueTimeout == 0so a strayResult{Requeue: true}(RequeueAfter == 0) on a non-periodic controller doesn't accidentally register as periodic-steady. PlainResult{}still counts via theresult.IsZero()branch.