Skip to content

Commit 9c9b657

Browse files
committed
MON-4387: Initial telemetry profile support
Adds in-cluster monitoring stack's telemetry metrics under the `telemetry` profile. Signed-off-by: Pranshu Srivastava <[email protected]>
1 parent d9bb6e4 commit 9c9b657

24 files changed

+427
-11
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Note: This CHANGELOG is only for the monitoring team to track all monitoring related changes. Please see OpenShift release notes for official changes.
22

3+
## 4.21
4+
5+
- [#2694](https://github.com/openshift/cluster-monitoring-operator/pull/2694) Add "telemetry" profile to the set of supported collection profiles. Switching to this profile will disable collection of all metrics except those required for telemetry purposes.
6+
37
## 4.20
48

59
- [#2595](https://github.com/openshift/cluster-monitoring-operator/pull/2595) Multi-tenant support for KSM's CRS feature-set downstream.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
labels:
5+
app.kubernetes.io/component: alert-router
6+
app.kubernetes.io/instance: main
7+
app.kubernetes.io/managed-by: cluster-monitoring-operator
8+
app.kubernetes.io/name: alertmanager
9+
app.kubernetes.io/part-of: openshift-monitoring
10+
app.kubernetes.io/version: 0.28.1
11+
monitoring.openshift.io/collection-profile: telemetry
12+
name: alertmanager-main-telemetry
13+
namespace: openshift-monitoring
14+
spec:
15+
endpoints:
16+
- bearerTokenFile: ""
17+
interval: 30s
18+
metricRelabelings:
19+
- action: keep
20+
regex: (alertmanager_integrations|scrape_samples_post_metric_relabeling|scrape_series_added|up)
21+
sourceLabels:
22+
- __name__
23+
port: metrics
24+
scheme: https
25+
tlsConfig:
26+
insecureSkipVerify: false
27+
serverName: alertmanager-main.openshift-monitoring.svc
28+
scrapeClass: tls-client-certificate-auth
29+
selector:
30+
matchLabels:
31+
app.kubernetes.io/component: alert-router
32+
app.kubernetes.io/instance: main
33+
app.kubernetes.io/name: alertmanager
34+
app.kubernetes.io/part-of: openshift-monitoring
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
labels:
5+
app.kubernetes.io/managed-by: cluster-monitoring-operator
6+
app.kubernetes.io/name: cluster-monitoring-operator
7+
app.kubernetes.io/part-of: openshift-monitoring
8+
monitoring.openshift.io/collection-profile: telemetry
9+
name: cluster-monitoring-operator-telemetry
10+
namespace: openshift-monitoring
11+
spec:
12+
endpoints:
13+
- bearerTokenFile: ""
14+
metricRelabelings:
15+
- action: keep
16+
regex: (cluster_monitoring_operator_collection_profile|scrape_samples_post_metric_relabeling|scrape_series_added|up)
17+
sourceLabels:
18+
- __name__
19+
port: https
20+
scheme: https
21+
tlsConfig:
22+
insecureSkipVerify: false
23+
serverName: cluster-monitoring-operato.openshift-monitoring.svc
24+
scrapeClass: tls-client-certificate-auth
25+
selector:
26+
matchLabels:
27+
app.kubernetes.io/name: cluster-monitoring-operator

assets/control-plane/telemetry-service-monitor-kubelet.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ spec:
1919
interval: 30s
2020
metricRelabelings:
2121
- action: keep
22-
regex: ()
22+
regex: (apiserver_storage_objects|container_cpu_usage_seconds_total|container_memory_working_set_bytes|kubelet_containers_per_pod_count_sum|up)
2323
sourceLabels:
2424
- __name__
2525
port: https-metrics
@@ -41,7 +41,7 @@ spec:
4141
- action: labeldrop
4242
regex: __tmp_keep_metric
4343
- action: keep
44-
regex: ()
44+
regex: (apiserver_storage_objects|container_cpu_usage_seconds_total|container_memory_working_set_bytes|kubelet_containers_per_pod_count_sum|up)
4545
sourceLabels:
4646
- __name__
4747
path: /metrics/cadvisor
@@ -62,7 +62,7 @@ spec:
6262
interval: 30s
6363
metricRelabelings:
6464
- action: keep
65-
regex: ()
65+
regex: (apiserver_storage_objects|container_cpu_usage_seconds_total|container_memory_working_set_bytes|kubelet_containers_per_pod_count_sum|up)
6666
sourceLabels:
6767
- __name__
6868
path: /metrics/probes
@@ -81,7 +81,7 @@ spec:
8181
interval: 30s
8282
metricRelabelings:
8383
- action: keep
84-
regex: ()
84+
regex: (apiserver_storage_objects|container_cpu_usage_seconds_total|container_memory_working_set_bytes|kubelet_containers_per_pod_count_sum|up)
8585
sourceLabels:
8686
- __name__
8787
port: https-metrics

assets/kube-state-metrics/telemetry-service-monitor.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ spec:
1919
- action: labeldrop
2020
regex: instance
2121
- action: keep
22-
regex: ()
22+
regex: (kube_node_labels|kube_node_role|kube_node_spec_unschedulable|kube_node_status_capacity|kube_node_status_condition|kube_pod_info|kube_pod_restart_policy|kube_running_pod_ready|scrape_samples_post_metric_relabeling|scrape_series_added|up)
2323
sourceLabels:
2424
- __name__
2525
port: https-main
@@ -35,7 +35,7 @@ spec:
3535
interval: 1m
3636
metricRelabelings:
3737
- action: keep
38-
regex: ()
38+
regex: (kube_node_labels|kube_node_role|kube_node_spec_unschedulable|kube_node_status_capacity|kube_node_status_condition|kube_pod_info|kube_pod_restart_policy|kube_running_pod_ready|scrape_samples_post_metric_relabeling|scrape_series_added|up)
3939
sourceLabels:
4040
- __name__
4141
port: https-self
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
labels:
5+
app.kubernetes.io/component: metrics-server
6+
app.kubernetes.io/managed-by: cluster-monitoring-operator
7+
app.kubernetes.io/name: metrics-server
8+
app.kubernetes.io/part-of: openshift-monitoring
9+
monitoring.openshift.io/collection-profile: telemetry
10+
name: metrics-server-telemetry
11+
namespace: openshift-monitoring
12+
spec:
13+
endpoints:
14+
- bearerTokenFile: ""
15+
metricRelabelings:
16+
- action: keep
17+
regex: (scrape_samples_post_metric_relabeling|scrape_samples_post_metric_relabeling|scrape_series_added|up)
18+
sourceLabels:
19+
- __name__
20+
port: https
21+
scheme: https
22+
tlsConfig:
23+
insecureSkipVerify: false
24+
serverName: metrics-serv.openshift-monitoring.svc
25+
scrapeClass: tls-client-certificate-auth
26+
selector:
27+
matchLabels:
28+
app.kubernetes.io/component: metrics-server
29+
app.kubernetes.io/name: metrics-server
30+
app.kubernetes.io/part-of: openshift-monitoring

assets/node-exporter/telemetry-service-monitor.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ spec:
2424
- action: labeldrop
2525
regex: __tmp_keep
2626
- action: keep
27-
regex: ()
27+
regex: (node_cpu_info|virt_platform|node_memory_MemTotal_bytes|node_memory_MemAvailable_bytes|node_cpu_seconds_total|up|scrape_series_added|scrape_samples_post_metric_relabeling|node_accelerator_card_info)
2828
sourceLabels:
2929
- __name__
3030
port: https
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
labels:
5+
app.kubernetes.io/managed-by: cluster-monitoring-operator
6+
app.kubernetes.io/part-of: openshift-monitoring
7+
k8s-app: openshift-state-metrics
8+
monitoring.openshift.io/collection-profile: telemetry
9+
name: openshift-state-metrics-telemetry
10+
namespace: openshift-monitoring
11+
spec:
12+
endpoints:
13+
- bearerTokenFile: ""
14+
honorLabels: true
15+
interval: 2m
16+
metricRelabelings:
17+
- action: keep
18+
regex: (scrape_samples_post_metric_relabeling|scrape_series_added|up)
19+
sourceLabels:
20+
- __name__
21+
port: https-main
22+
scheme: https
23+
scrapeTimeout: 2m
24+
tlsConfig:
25+
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
26+
insecureSkipVerify: false
27+
serverName: openshift-state-metrics.openshift-monitoring.svc
28+
- bearerTokenFile: ""
29+
interval: 2m
30+
metricRelabelings:
31+
- action: keep
32+
regex: (scrape_samples_post_metric_relabeling|scrape_series_added|up)
33+
sourceLabels:
34+
- __name__
35+
port: https-self
36+
scheme: https
37+
scrapeTimeout: 2m
38+
tlsConfig:
39+
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
40+
insecureSkipVerify: false
41+
serverName: openshift-state-metrics.openshift-monitoring.svc
42+
jobLabel: k8s-app
43+
scrapeClass: tls-client-certificate-auth
44+
selector:
45+
matchLabels:
46+
k8s-app: openshift-state-metrics
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
labels:
5+
app.kubernetes.io/component: thanos-sidecar
6+
app.kubernetes.io/instance: k8s
7+
app.kubernetes.io/managed-by: cluster-monitoring-operator
8+
app.kubernetes.io/name: prometheus
9+
app.kubernetes.io/part-of: openshift-monitoring
10+
app.kubernetes.io/version: 3.5.0
11+
monitoring.openshift.io/collection-profile: telemetry
12+
name: thanos-sidecar-telemetry
13+
namespace: openshift-monitoring
14+
spec:
15+
endpoints:
16+
- bearerTokenFile: ""
17+
interval: 30s
18+
metricRelabelings:
19+
- action: keep
20+
regex: (ALERTS|prometheus_tsdb_head_series|up|scrape_samples_post_metric_relabeling|scrape_series_added)
21+
sourceLabels:
22+
- __name__
23+
port: thanos-proxy
24+
scheme: https
25+
tlsConfig:
26+
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
27+
certFile: /etc/prometheus/secrets/metrics-client-certs/tls.crt
28+
insecureSkipVerify: false
29+
keyFile: /etc/prometheus/secrets/metrics-client-certs/tls.key
30+
serverName: thanos-sideca.openshift-monitoring.svc
31+
scrapeClass: tls-client-certificate-auth
32+
selector:
33+
matchLabels:
34+
app.kubernetes.io/component: thanos-sidecar
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
labels:
5+
app.kubernetes.io/component: prometheus
6+
app.kubernetes.io/instance: k8s
7+
app.kubernetes.io/managed-by: cluster-monitoring-operator
8+
app.kubernetes.io/name: prometheus
9+
app.kubernetes.io/part-of: openshift-monitoring
10+
app.kubernetes.io/version: 3.5.0
11+
monitoring.openshift.io/collection-profile: telemetry
12+
name: prometheus-k8s-telemetry
13+
namespace: openshift-monitoring
14+
spec:
15+
endpoints:
16+
- bearerTokenFile: ""
17+
interval: 30s
18+
metricRelabelings:
19+
- action: keep
20+
regex: (prometheus_tsdb_head_series|up|scrape_samples_post_metric_relabeling|scrape_series_added)
21+
sourceLabels:
22+
- __name__
23+
port: metrics
24+
scheme: https
25+
tlsConfig:
26+
insecureSkipVerify: false
27+
serverName: prometheus-k8s.openshift-monitoring.svc
28+
scrapeClass: tls-client-certificate-auth
29+
selector:
30+
matchLabels:
31+
app.kubernetes.io/component: prometheus
32+
app.kubernetes.io/instance: k8s
33+
app.kubernetes.io/name: prometheus
34+
app.kubernetes.io/part-of: openshift-monitoring

0 commit comments

Comments
 (0)