-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(deps): update helm release victoria-metrics-k8s-stack to v0.35.0 #361
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- kubernetes/apps/pipelines/data/victoria-metrics/app Kustomization: flux-system/data-pipeline-vmetrics HelmRelease: data/victoria-metrics
+++ kubernetes/apps/pipelines/data/victoria-metrics/app Kustomization: flux-system/data-pipeline-vmetrics HelmRelease: data/victoria-metrics
@@ -14,13 +14,13 @@
chart: victoria-metrics-k8s-stack
interval: 15m
sourceRef:
kind: HelmRepository
name: victoria-metrics
namespace: flux-system
- version: 0.34.0
+ version: 0.35.0
install:
createNamespace: true
remediation:
retries: 3
interval: 5m
maxHistory: 2
--- kubernetes/apps/monitoring-dev/victoria-metrics/app Kustomization: flux-system/victoria-metrics-dev HelmRelease: monitoring-dev/victoria-metrics-dev
+++ kubernetes/apps/monitoring-dev/victoria-metrics/app Kustomization: flux-system/victoria-metrics-dev HelmRelease: monitoring-dev/victoria-metrics-dev
@@ -14,13 +14,13 @@
chart: victoria-metrics-k8s-stack
interval: 15m
sourceRef:
kind: HelmRepository
name: victoria-metrics
namespace: flux-system
- version: 0.34.0
+ version: 0.35.0
install:
createNamespace: true
remediation:
retries: 3
interval: 5m
maxHistory: 2
--- kubernetes/apps/monitoring/victoria-metrics/app Kustomization: flux-system/victoria-metrics HelmRelease: monitoring/victoria-metrics
+++ kubernetes/apps/monitoring/victoria-metrics/app Kustomization: flux-system/victoria-metrics HelmRelease: monitoring/victoria-metrics
@@ -14,13 +14,13 @@
chart: victoria-metrics-k8s-stack
interval: 15m
sourceRef:
kind: HelmRepository
name: victoria-metrics
namespace: flux-system
- version: 0.34.0
+ version: 0.35.0
install:
createNamespace: true
remediation:
retries: 3
interval: 5m
maxHistory: 2 |
--- HelmRelease: monitoring-dev/victoria-metrics-dev VMAgent: monitoring-dev/vmetrics-dev
+++ HelmRelease: monitoring-dev/victoria-metrics-dev VMAgent: monitoring-dev/vmetrics-dev
@@ -12,13 +12,13 @@
spec:
externalLabels: {}
extraArgs:
promscrape.dropOriginalLabels: 'true'
promscrape.streamParse: 'true'
image:
- tag: v1.109.1
+ tag: v1.110.0
license: {}
port: '8429'
remoteWrite:
- url: http://vmsingle-vmetrics-dev.monitoring-dev.svc.cluster.local.:8429/api/v1/write
scrapeInterval: 20s
selectAllByDefault: true
--- HelmRelease: monitoring-dev/victoria-metrics-dev VMSingle: monitoring-dev/vmetrics-dev
+++ HelmRelease: monitoring-dev/victoria-metrics-dev VMSingle: monitoring-dev/vmetrics-dev
@@ -10,13 +10,13 @@
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
spec:
extraArgs:
search.maxUniqueTimeseries: '600000'
image:
- tag: v1.109.1
+ tag: v1.110.0
license: {}
port: '8429'
replicaCount: 1
resources: {}
retentionPeriod: '1'
storage:
--- HelmRelease: data/victoria-metrics VMAgent: data/vmetrics-data
+++ HelmRelease: data/victoria-metrics VMAgent: data/vmetrics-data
@@ -12,13 +12,13 @@
spec:
externalLabels: {}
extraArgs:
promscrape.dropOriginalLabels: 'true'
promscrape.streamParse: 'true'
image:
- tag: v1.109.1
+ tag: v1.110.0
license: {}
port: '8429'
remoteWrite:
- url: http://vmsingle-vmetrics-data.data.svc.cluster.local.:8429/api/v1/write
scrapeInterval: 20s
selectAllByDefault: true
--- HelmRelease: data/victoria-metrics VMSingle: data/vmetrics-data
+++ HelmRelease: data/victoria-metrics VMSingle: data/vmetrics-data
@@ -10,13 +10,13 @@
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
spec:
extraArgs:
search.maxUniqueTimeseries: '600000'
image:
- tag: v1.109.1
+ tag: v1.110.0
license: {}
port: '8429'
replicaCount: 1
resources: {}
retentionPeriod: 50y
storage:
--- HelmRelease: monitoring/victoria-metrics GrafanaDashboard: monitoring/vmetrics-controller-manager
+++ HelmRelease: monitoring/victoria-metrics GrafanaDashboard: monitoring/vmetrics-controller-manager
@@ -8,12 +8,12 @@
app: victoria-metrics-k8s-stack-grafana
app.kubernetes.io/instance: victoria-metrics
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
spec:
json: |
- {"editable":false,"links":[{"asDropdown":true,"includeVars":true,"keepTime":true,"tags":["kubernetes-mixin"],"targetBlank":false,"title":"Kubernetes","type":"dashboards"}],"panels":[{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"unit":"none"}},"gridPos":{"h":7,"w":4,"x":0,"y":0},"id":1,"interval":"1m","options":{"colorMode":"none"},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"sum(up{ cluster=~\"$cluster\", job=\"kube-controller-manager\"})","instant":true}],"title":"Up","type":"stat"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"ops"}},"gridPos":{"h":7,"w":20,"x":4,"y":0},"id":2,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"sum(rate(workqueue_adds_total{ cluster=~\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, name)","legendFormat":"{{cluster}} {{instance}} {{name}}"}],"title":"Work Queue Add Rate","type":"timeseries"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"short"}},"gridPos":{"h":7,"w":24,"x":0,"y":7},"id":3,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"sum(rate(workqueue_depth{ cluster=~\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, name)","legendFormat":"{{cluster}} {{instance}} {{name}}"}],"title":"Work Queue Depth","type":"timeseries"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"s"}},"gridPos":{"h":7,"w":24,"x":0,"y":14},"id":4,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{ cluster=~\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\"}[$__rate_interval])) by (cluster, instance, name, le))","legendFormat":"{{cluster}} {{instance}} {{name}}"}],"title":"Work Queue Latency","type":"timeseries"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"ops"}},"gridPos":{"h":7,"w":8,"x":0,"y":21},"id":5,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"2..\"}[$__rate_interval]))","legendFormat":"2xx"},{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"3..\"}[$__rate_interval]))","legendFormat":"3xx"},{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"4..\"}[$__rate_interval]))","legendFormat":"4xx"},{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"sum(rate(rest_client_requests_total{job=\"kube-controller-manager\", instance=~\"$instance\",code=~\"5..\"}[$__rate_interval]))","legendFormat":"5xx"}],"title":"Kube API Request Rate","type":"timeseries"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"s"}},"gridPos":{"h":7,"w":16,"x":8,"y":21},"id":6,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{ cluster=~\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"POST\"}[$__rate_interval])) by (verb, url, le))","legendFormat":"{{verb}} {{url}}"}],"title":"Post Request Latency 99th Quantile","type":"timeseries"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"s"}},"gridPos":{"h":7,"w":24,"x":0,"y":28},"id":7,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{ cluster=~\"$cluster\", job=\"kube-controller-manager\", instance=~\"$instance\", verb=\"GET\"}[$__rate_interval])) by (verb, url, le))","legendFormat":"{{verb}} {{url}}"}],"title":"Get Request Latency 99th Quantile","type":"timeseries"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"bytes"}},"gridPos":{"h":7,"w":8,"x":0,"y":35},"id":8,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"process_resident_memory_bytes{ cluster=~\"$cluster\", job=\"kube-controller-manager\",instance=~\"$instance\"}","legendFormat":"{{instance}}"}],"title":"Memory","type":"timeseries"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"short"}},"gridPos":{"h":7,"w":8,"x":8,"y":35},"id":9,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"rate(process_cpu_seconds_total{ cluster=~\"$cluster\", job=\"kube-controller-manager\",instance=~\"$instance\"}[$__rate_interval])","legendFormat":"{{instance}}"}],"title":"CPU usage","type":"timeseries"},{"datasource":{"type":"prometheus","uid":"-- Mixed --"},"fieldConfig":{"defaults":{"custom":{"fillOpacity":10,"showPoints":"never","spanNulls":true},"unit":"short"}},"gridPos":{"h":7,"w":8,"x":16,"y":35},"id":10,"interval":"1m","options":{"legend":{"asTable":true,"calcs":["lastNotNull"],"displayMode":"table","placement":"right","showLegend":true},"tooltip":{"mode":"single"}},"pluginVersion":"v11.4.0","targets":[{"datasource":{"type":"prometheus","uid":"${datasource}"},"expr":"go_goroutines{ cluster=~\"$cluster\", job=\"kube-controller-manager\",instance=~\"$instance\"}","legendFormat":"{{instance}}"}],"title":"Goroutines","type":"timeseries"}],"refresh":"10s","schemaVersion":39,"tags":["kubernetes-mixin","vm-k8s-stack"],"templating":{"list":[{"current":{"selected":true,"text":"default","value":"default"},"hide":0,"label":"Data source","name":"datasource","query":"prometheus","regex":"","type":"datasource"},{"datasource":{"type":"prometheus","uid":"${datasource}"},"hide":2,"label":"cluster","name":"cluster","query":".*","refresh":2,"sort":1,"type":"constant"},{"datasource":{"type":"prometheus","uid":"${datasource}"},"hide":0,"includeAll":true,"label":"instance","name":"instance","query":"label_values(up{ cluster=~\"$cluster\", job=\"kube-controller-manager\"}, instance)","refresh":2,"sort":1,"type":"query"}]},"time":{"from":"now-1h","to":"now"},"timezone":"utc","title":"Kubernetes / Controller Manager","uid":"72e0e05bef5099e5f049b05fdc429ed4"}
[Diff truncated by flux-local]
--- HelmRelease: monitoring/victoria-metrics GrafanaDashboard: monitoring/vmetrics-kubelet
+++ HelmRelease: monitoring/victoria-metrics GrafanaDashboard: monitoring/vmetrics-kubelet
@@ -8,12 +8,12 @@
app: victoria-metrics-k8s-stack-grafana
app.kubernetes.io/instance: victoria-metrics
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
spec:
json: |
[Diff truncated by flux-local]
--- HelmRelease: monitoring/victoria-metrics GrafanaDashboard: monitoring/vmetrics-scheduler
+++ HelmRelease: monitoring/victoria-metrics GrafanaDashboard: monitoring/vmetrics-scheduler
@@ -8,12 +8,12 @@
app: victoria-metrics-k8s-stack-grafana
app.kubernetes.io/instance: victoria-metrics
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
spec:
json: |
[Diff truncated by flux-local]
--- HelmRelease: monitoring/victoria-metrics VMAgent: monitoring/vmetrics
+++ HelmRelease: monitoring/victoria-metrics VMAgent: monitoring/vmetrics
@@ -12,13 +12,13 @@
spec:
externalLabels: {}
extraArgs:
promscrape.dropOriginalLabels: 'true'
promscrape.streamParse: 'true'
image:
- tag: v1.109.1
+ tag: v1.110.0
license: {}
port: '8429'
remoteWrite:
- url: http://vmsingle-vmetrics.monitoring.svc.cluster.local.:8429/api/v1/write
resources:
limits:
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kube-apiserver-availability.rules
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kube-apiserver-availability.rules
@@ -51,28 +51,28 @@
expr: |-
1 - (
(
# write too slow
sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"POST|PUT|PATCH|DELETE"})
-
- sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le="1"})
+ sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le=~"1(\\.0)?"})
) +
(
# read too slow
sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"LIST|GET"})
-
(
(
- sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le="1"})
+ sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le=~"1(\\.0)?"})
or
vector(0)
)
+
- sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le="5"})
+ sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le=~"5(\\.0)?"})
+
- sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le="30"})
+ sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le=~"30(\\.0)?"})
)
) +
# errors
sum by (cluster) (code:apiserver_request_total:increase30d{code=~"5.."} or vector(0))
)
/
@@ -85,20 +85,20 @@
1 - (
sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"LIST|GET"})
-
(
# too slow
(
- sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le="1"})
+ sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le=~"1(\\.0)?"})
or
vector(0)
)
+
- sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le="5"})
+ sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le=~"5(\\.0)?"})
+
- sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le="30"})
+ sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le=~"30(\\.0)?"})
)
+
# errors
sum by (cluster) (code:apiserver_request_total:increase30d{verb="read",code=~"5.."} or vector(0))
)
/
@@ -110,13 +110,13 @@
expr: |-
1 - (
(
# too slow
sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"POST|PUT|PATCH|DELETE"})
-
- sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le="1"})
+ sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le=~"1(\\.0)?"})
)
+
# errors
sum by (cluster) (code:apiserver_request_total:increase30d{verb="write",code=~"5.."} or vector(0))
)
/
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kube-apiserver-burnrate.rules
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kube-apiserver-burnrate.rules
@@ -20,20 +20,20 @@
(
# too slow
sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[1d]))
-
(
(
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[1d]))
- or
- vector(0)
- )
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[1d]))
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[1d]))
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[1d]))
+ or
+ vector(0)
+ )
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[1d]))
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[1d]))
)
)
+
# errors
sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d]))
)
@@ -48,20 +48,20 @@
(
# too slow
sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[1h]))
-
(
(
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[1h]))
- or
- vector(0)
- )
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[1h]))
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[1h]))
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[1h]))
+ or
+ vector(0)
+ )
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[1h]))
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[1h]))
)
)
+
# errors
sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1h]))
)
@@ -76,20 +76,20 @@
(
# too slow
sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[2h]))
-
(
(
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[2h]))
- or
- vector(0)
- )
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[2h]))
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[2h]))
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[2h]))
+ or
+ vector(0)
+ )
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[2h]))
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[2h]))
)
)
+
# errors
sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[2h]))
)
@@ -104,20 +104,20 @@
(
# too slow
sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[30m]))
-
(
(
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[30m]))
- or
- vector(0)
- )
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[30m]))
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[30m]))
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[30m]))
+ or
+ vector(0)
+ )
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[30m]))
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[30m]))
)
)
+
# errors
sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[30m]))
)
@@ -132,20 +132,20 @@
(
# too slow
sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[3d]))
-
(
(
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[3d]))
- or
- vector(0)
- )
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[3d]))
- +
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[3d]))
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[3d]))
+ or
+ vector(0)
+ )
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[3d]))
+ +
+ sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[3d]))
)
)
+
# errors
sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[3d]))
)
@@ -160,20 +160,20 @@
(
# too slow
sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[5m]))
-
(
(
- sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[5m]))
- or
- vector(0)
- )
- +
[Diff truncated by flux-local]
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kube-apiserver-slos
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kube-apiserver-slos
@@ -13,13 +13,14 @@
groups:
- name: kube-apiserver-slos
params: {}
rules:
- alert: KubeAPIErrorBudgetBurn
annotations:
- description: The API server is burning too much error budget.
+ description: The API server is burning too much error budget on cluster {{
+ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
summary: The API server is burning too much error budget.
expr: |-
sum by (cluster) (apiserver_request:burnrate1h) > (14.40 * 0.01000)
and on (cluster)
sum by (cluster) (apiserver_request:burnrate5m) > (14.40 * 0.01000)
@@ -27,13 +28,14 @@
labels:
long: 1h
severity: critical
short: 5m
- alert: KubeAPIErrorBudgetBurn
annotations:
- description: The API server is burning too much error budget.
+ description: The API server is burning too much error budget on cluster {{
+ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
summary: The API server is burning too much error budget.
expr: |-
sum by (cluster) (apiserver_request:burnrate6h) > (6.00 * 0.01000)
and on (cluster)
sum by (cluster) (apiserver_request:burnrate30m) > (6.00 * 0.01000)
@@ -41,13 +43,14 @@
labels:
long: 6h
severity: critical
short: 30m
- alert: KubeAPIErrorBudgetBurn
annotations:
- description: The API server is burning too much error budget.
+ description: The API server is burning too much error budget on cluster {{
+ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
summary: The API server is burning too much error budget.
expr: |-
sum by (cluster) (apiserver_request:burnrate1d) > (3.00 * 0.01000)
and on (cluster)
sum by (cluster) (apiserver_request:burnrate2h) > (3.00 * 0.01000)
@@ -55,13 +58,14 @@
labels:
long: 1d
severity: warning
short: 2h
- alert: KubeAPIErrorBudgetBurn
annotations:
- description: The API server is burning too much error budget.
+ description: The API server is burning too much error budget on cluster {{
+ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
summary: The API server is burning too much error budget.
expr: |-
sum by (cluster) (apiserver_request:burnrate3d) > (1.00 * 0.01000)
and on (cluster)
sum by (cluster) (apiserver_request:burnrate6h) > (1.00 * 0.01000)
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-apps
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-apps
@@ -14,24 +14,25 @@
- name: kubernetes-apps
params: {}
rules:
- alert: KubePodCrashLooping
annotations:
description: 'Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container
- }}) is in waiting state (reason: "CrashLoopBackOff").'
+ }}) is in waiting state (reason: "CrashLoopBackOff") on cluster {{ $labels.cluster
+ }}.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubepodcrashlooping
summary: Pod is crash looping.
expr: max_over_time(kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff",
job="kube-state-metrics", namespace=~".*"}[5m]) >= 1
for: 15m
labels:
severity: warning
- alert: KubePodNotReady
annotations:
description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready
- state for longer than 15 minutes.
+ state for longer than 15 minutes on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubepodnotready
summary: Pod has been in a non-ready state for more than 15 minutes.
expr: |-
sum by (namespace,pod,cluster) (
max by (namespace,pod,cluster) (
kube_pod_status_phase{job="kube-state-metrics", namespace=~".*", phase=~"Pending|Unknown|Failed"}
@@ -43,26 +44,27 @@
labels:
severity: warning
- alert: KubeDeploymentGenerationMismatch
annotations:
description: Deployment generation for {{ $labels.namespace }}/{{ $labels.deployment
}} does not match, this indicates that the Deployment has failed but has
- not been rolled back.
+ not been rolled back on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubedeploymentgenerationmismatch
summary: Deployment generation mismatch due to possible roll-back
expr: |-
kube_deployment_status_observed_generation{job="kube-state-metrics", namespace=~".*"}
!=
kube_deployment_metadata_generation{job="kube-state-metrics", namespace=~".*"}
for: 15m
labels:
severity: warning
- alert: KubeDeploymentReplicasMismatch
annotations:
description: Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has
- not matched the expected number of replicas for longer than 15 minutes.
+ not matched the expected number of replicas for longer than 15 minutes on
+ cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubedeploymentreplicasmismatch
summary: Deployment has not matched the expected number of replicas.
expr: |-
(
kube_deployment_spec_replicas{job="kube-state-metrics", namespace=~".*"}
>
@@ -75,25 +77,27 @@
for: 15m
labels:
severity: warning
- alert: KubeDeploymentRolloutStuck
annotations:
description: Rollout of deployment {{ $labels.namespace }}/{{ $labels.deployment
- }} is not progressing for longer than 15 minutes.
+ }} is not progressing for longer than 15 minutes on cluster {{ $labels.cluster
+ }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubedeploymentrolloutstuck
summary: Deployment rollout is not progressing.
expr: |-
kube_deployment_status_condition{condition="Progressing", status="false",job="kube-state-metrics", namespace=~".*"}
!= 0
for: 15m
labels:
severity: warning
- alert: KubeStatefulSetReplicasMismatch
annotations:
description: StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }}
- has not matched the expected number of replicas for longer than 15 minutes.
+ has not matched the expected number of replicas for longer than 15 minutes
+ on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubestatefulsetreplicasmismatch
summary: StatefulSet has not matched the expected number of replicas.
expr: |-
(
kube_statefulset_status_replicas_ready{job="kube-state-metrics", namespace=~".*"}
!=
@@ -107,26 +111,26 @@
labels:
severity: warning
- alert: KubeStatefulSetGenerationMismatch
annotations:
description: StatefulSet generation for {{ $labels.namespace }}/{{ $labels.statefulset
}} does not match, this indicates that the StatefulSet has failed but has
- not been rolled back.
+ not been rolled back on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubestatefulsetgenerationmismatch
summary: StatefulSet generation mismatch due to possible roll-back
expr: |-
kube_statefulset_status_observed_generation{job="kube-state-metrics", namespace=~".*"}
!=
kube_statefulset_metadata_generation{job="kube-state-metrics", namespace=~".*"}
for: 15m
labels:
severity: warning
- alert: KubeStatefulSetUpdateNotRolledOut
annotations:
description: StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }}
- update has not been rolled out.
+ update has not been rolled out on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubestatefulsetupdatenotrolledout
summary: StatefulSet update has not been rolled out.
expr: |-
(
max by (namespace,statefulset,job,cluster) (
kube_statefulset_status_current_revision{job="kube-state-metrics", namespace=~".*"}
@@ -147,13 +151,14 @@
for: 15m
labels:
severity: warning
- alert: KubeDaemonSetRolloutStuck
annotations:
description: DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has
- not finished or progressed for at least 15m.
+ not finished or progressed for at least 15m on cluster {{ $labels.cluster
+ }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubedaemonsetrolloutstuck
summary: DaemonSet rollout is stuck.
expr: |-
(
(
kube_daemonset_status_current_number_scheduled{job="kube-state-metrics", namespace=~".*"}
@@ -181,70 +186,74 @@
labels:
severity: warning
- alert: KubeContainerWaiting
annotations:
description: 'pod/{{ $labels.pod }} in namespace {{ $labels.namespace }} on
container {{ $labels.container}} has been in waiting state for longer than
- 1 hour. (reason: "{{ $labels.reason }}").'
+ 1 hour. (reason: "{{ $labels.reason }}") on cluster {{ $labels.cluster }}.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubecontainerwaiting
summary: Pod container waiting longer than 1 hour
expr: kube_pod_container_status_waiting_reason{reason!="CrashLoopBackOff", job="kube-state-metrics",
namespace=~".*"} > 0
for: 1h
labels:
severity: warning
- alert: KubeDaemonSetNotScheduled
annotations:
description: '{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset
- }} are not scheduled.'
+ }} are not scheduled on cluster {{ $labels.cluster }}.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubedaemonsetnotscheduled
summary: DaemonSet pods are not scheduled.
expr: |-
kube_daemonset_status_desired_number_scheduled{job="kube-state-metrics", namespace=~".*"}
-
kube_daemonset_status_current_number_scheduled{job="kube-state-metrics", namespace=~".*"} > 0
for: 10m
labels:
severity: warning
- alert: KubeDaemonSetMisScheduled
annotations:
description: '{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset
- }} are running where they are not supposed to run.'
+ }} are running where they are not supposed to run on cluster {{ $labels.cluster
+ }}.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubedaemonsetmisscheduled
summary: DaemonSet pods are misscheduled.
expr: kube_daemonset_status_number_misscheduled{job="kube-state-metrics", namespace=~".*"}
> 0
for: 15m
labels:
severity: warning
- alert: KubeJobNotCompleted
annotations:
description: Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking
- more than {{ "43200" | humanizeDuration }} to complete.
+ more than {{ "43200" | humanizeDuration }} to complete on cluster {{ $labels.cluster
+ }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubejobnotcompleted
summary: Job did not complete in time
expr: |-
time() - max by (namespace,job_name,cluster) (kube_job_status_start_time{job="kube-state-metrics", namespace=~".*"}
and
kube_job_status_active{job="kube-state-metrics", namespace=~".*"} > 0) > 43200
labels:
severity: warning
- alert: KubeJobFailed
annotations:
description: Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to
- complete. Removing failed job after investigation should clear this alert.
+ complete. Removing failed job after investigation should clear this alert
+ on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubejobfailed
summary: Job failed to complete.
[Diff truncated by flux-local]
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-resources
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-resources
@@ -69,13 +69,13 @@
for: 5m
labels:
severity: warning
- alert: KubeQuotaAlmostFull
annotations:
description: Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage
- }} of its {{ $labels.resource }} quota.
+ }} of its {{ $labels.resource }} quota on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubequotaalmostfull
summary: Namespace quota is going to be full.
expr: |-
kube_resourcequota{job="kube-state-metrics", type="used"}
/ ignoring(instance, job, type)
(kube_resourcequota{job="kube-state-metrics", type="hard"} > 0)
@@ -83,13 +83,13 @@
for: 15m
labels:
severity: info
- alert: KubeQuotaFullyUsed
annotations:
description: Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage
- }} of its {{ $labels.resource }} quota.
+ }} of its {{ $labels.resource }} quota on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubequotafullyused
summary: Namespace quota is fully used.
expr: |-
kube_resourcequota{job="kube-state-metrics", type="used"}
/ ignoring(instance, job, type)
(kube_resourcequota{job="kube-state-metrics", type="hard"} > 0)
@@ -97,13 +97,13 @@
for: 15m
labels:
severity: info
- alert: KubeQuotaExceeded
annotations:
description: Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage
- }} of its {{ $labels.resource }} quota.
+ }} of its {{ $labels.resource }} quota on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubequotaexceeded
summary: Namespace quota has exceeded the limits.
expr: |-
kube_resourcequota{job="kube-state-metrics", type="used"}
/ ignoring(instance, job, type)
(kube_resourcequota{job="kube-state-metrics", type="hard"} > 0)
@@ -112,13 +112,13 @@
labels:
severity: warning
- alert: CPUThrottlingHigh
annotations:
description: '{{ $value | humanizePercentage }} throttling of CPU in namespace
{{ $labels.namespace }} for container {{ $labels.container }} in pod {{
- $labels.pod }}.'
+ $labels.pod }} on cluster {{ $labels.cluster }}.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/cputhrottlinghigh
summary: Processes experience elevated CPU throttling.
expr: |-
sum(increase(container_cpu_cfs_throttled_periods_total{container!="", job="kubelet", metrics_path="/metrics/cadvisor", }[5m])) without (id, metrics_path, name, image, endpoint, job, node)
/
sum(increase(container_cpu_cfs_periods_total{job="kubelet", metrics_path="/metrics/cadvisor", }[5m])) without (id, metrics_path, name, image, endpoint, job, node)
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-system-apiserver
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-system-apiserver
@@ -52,13 +52,14 @@
for: 10m
labels:
severity: warning
- alert: KubeAggregatedAPIDown
annotations:
description: Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace
- }} has been only {{ $value | humanize }}% available over the last 10m.
+ }} has been only {{ $value | humanize }}% available over the last 10m on
+ cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeaggregatedapidown
summary: Kubernetes aggregated API is down.
expr: (1 - max by (name,namespace,cluster)(avg_over_time(aggregator_unavailable_apiservice{job="apiserver"}[10m])))
* 100 < 85
for: 5m
labels:
@@ -72,13 +73,13 @@
for: 15m
labels:
severity: critical
- alert: KubeAPITerminatedRequests
annotations:
description: The kubernetes apiserver has terminated {{ $value | humanizePercentage
- }} of its incoming requests.
+ }} of its incoming requests on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapiterminatedrequests
summary: The kubernetes apiserver has terminated {{ $value | humanizePercentage
}} of its incoming requests.
expr: sum by (cluster) (rate(apiserver_request_terminations_total{job="apiserver"}[10m]))
/ ( sum by (cluster) (rate(apiserver_request_total{job="apiserver"}[10m]))
+ sum by (cluster) (rate(apiserver_request_terminations_total{job="apiserver"}[10m]))
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-system-kubelet
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-system-kubelet
@@ -13,135 +13,143 @@
groups:
- name: kubernetes-system-kubelet
params: {}
rules:
- alert: KubeNodeNotReady
annotations:
- description: '{{ $labels.node }} has been unready for more than 15 minutes.'
+ description: '{{ $labels.node }} has been unready for more than 15 minutes
+ on cluster {{ $labels.cluster }}.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubenodenotready
summary: Node is not ready.
expr: kube_node_status_condition{job="kube-state-metrics",condition="Ready",status="true"}
== 0
for: 15m
labels:
severity: warning
- alert: KubeNodeUnreachable
annotations:
description: '{{ $labels.node }} is unreachable and some workloads may be
- rescheduled.'
+ rescheduled on cluster {{ $labels.cluster }}.'
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubenodeunreachable
summary: Node is unreachable.
expr: (kube_node_spec_taint{job="kube-state-metrics",key="node.kubernetes.io/unreachable",effect="NoSchedule"}
unless ignoring(key,value) kube_node_spec_taint{job="kube-state-metrics",key=~"ToBeDeletedByClusterAutoscaler|cloud.google.com/impending-node-termination|aws-node-termination-handler/spot-itn"})
== 1
for: 15m
labels:
severity: warning
- alert: KubeletTooManyPods
annotations:
description: Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage
- }} of its Pod capacity.
+ }} of its Pod capacity on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubelettoomanypods
summary: Kubelet is running at capacity.
expr: |-
count by (node,cluster) (
- (kube_pod_status_phase{job="kube-state-metrics",phase="Running"} == 1) * on (instance,pod,namespace,cluster) group_left(node) topk by (instance,pod,namespace,cluster) (1, kube_pod_info{job="kube-state-metrics"})
+ (kube_pod_status_phase{job="kube-state-metrics", phase="Running"} == 1)
+ * on (namespace,pod,cluster) group_left (node)
+ group by (namespace,pod,node,cluster) (
+ kube_pod_info{job="kube-state-metrics"}
+ )
)
/
max by (node,cluster) (
- kube_node_status_capacity{job="kube-state-metrics",resource="pods"} != 1
+ kube_node_status_capacity{job="kube-state-metrics", resource="pods"} != 1
) > 0.95
for: 15m
labels:
severity: info
- alert: KubeNodeReadinessFlapping
annotations:
description: The readiness status of node {{ $labels.node }} has changed {{
- $value }} times in the last 15 minutes.
+ $value }} times in the last 15 minutes on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubenodereadinessflapping
summary: Node readiness status is flapping.
expr: sum(changes(kube_node_status_condition{job="kube-state-metrics",status="true",condition="Ready"}[15m]))
by (node,cluster) > 2
for: 15m
labels:
severity: warning
- alert: KubeletPlegDurationHigh
annotations:
description: The Kubelet Pod Lifecycle Event Generator has a 99th percentile
- duration of {{ $value }} seconds on node {{ $labels.node }}.
+ duration of {{ $value }} seconds on node {{ $labels.node }} on cluster {{
+ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletplegdurationhigh
summary: Kubelet Pod Lifecycle Event Generator is taking too long to relist.
expr: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile="0.99"}
>= 10
for: 5m
labels:
severity: warning
- alert: KubeletPodStartUpLatencyHigh
annotations:
description: Kubelet Pod startup 99th percentile latency is {{ $value }} seconds
- on node {{ $labels.node }}.
+ on node {{ $labels.node }} on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletpodstartuplatencyhigh
summary: Kubelet Pod startup latency is too high.
expr: histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job="kubelet",
metrics_path="/metrics"}[5m])) by (instance,le,cluster)) * on (instance,cluster)
group_left(node) kubelet_node_name{job="kubelet", metrics_path="/metrics"}
> 60
for: 15m
labels:
severity: warning
- alert: KubeletClientCertificateExpiration
annotations:
description: Client certificate for Kubelet on node {{ $labels.node }} expires
- in {{ $value | humanizeDuration }}.
+ in {{ $value | humanizeDuration }} on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletclientcertificateexpiration
summary: Kubelet client certificate is about to expire.
expr: kubelet_certificate_manager_client_ttl_seconds < 604800
labels:
severity: warning
- alert: KubeletClientCertificateExpiration
annotations:
description: Client certificate for Kubelet on node {{ $labels.node }} expires
- in {{ $value | humanizeDuration }}.
+ in {{ $value | humanizeDuration }} on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletclientcertificateexpiration
summary: Kubelet client certificate is about to expire.
expr: kubelet_certificate_manager_client_ttl_seconds < 86400
labels:
severity: critical
- alert: KubeletServerCertificateExpiration
annotations:
description: Server certificate for Kubelet on node {{ $labels.node }} expires
- in {{ $value | humanizeDuration }}.
+ in {{ $value | humanizeDuration }} on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletservercertificateexpiration
summary: Kubelet server certificate is about to expire.
expr: kubelet_certificate_manager_server_ttl_seconds < 604800
labels:
severity: warning
- alert: KubeletServerCertificateExpiration
annotations:
description: Server certificate for Kubelet on node {{ $labels.node }} expires
- in {{ $value | humanizeDuration }}.
+ in {{ $value | humanizeDuration }} on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletservercertificateexpiration
summary: Kubelet server certificate is about to expire.
expr: kubelet_certificate_manager_server_ttl_seconds < 86400
labels:
severity: critical
- alert: KubeletClientCertificateRenewalErrors
annotations:
description: Kubelet on node {{ $labels.node }} has failed to renew its client
- certificate ({{ $value | humanize }} errors in the last 5 minutes).
+ certificate ({{ $value | humanize }} errors in the last 5 minutes) on cluster
+ {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletclientcertificaterenewalerrors
summary: Kubelet has failed to renew its client certificate.
expr: increase(kubelet_certificate_manager_client_expiration_renew_errors[5m])
> 0
for: 15m
labels:
severity: warning
- alert: KubeletServerCertificateRenewalErrors
annotations:
description: Kubelet on node {{ $labels.node }} has failed to renew its server
- certificate ({{ $value | humanize }} errors in the last 5 minutes).
+ certificate ({{ $value | humanize }} errors in the last 5 minutes) on cluster
+ {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletservercertificaterenewalerrors
summary: Kubelet has failed to renew its server certificate.
expr: increase(kubelet_server_expiration_renew_errors[5m]) > 0
for: 15m
labels:
severity: warning
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-system
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-kubernetes-system
@@ -14,24 +14,25 @@
- name: kubernetes-system
params: {}
rules:
- alert: KubeVersionMismatch
annotations:
description: There are {{ $value }} different semantic versions of Kubernetes
- components running.
+ components running on cluster {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeversionmismatch
summary: Different semantic versions of Kubernetes components running.
expr: count by (cluster) (count by (git_version,cluster) (label_replace(kubernetes_build_info{job!~"kube-dns|coredns"},"git_version","$1","git_version","(v[0-9]*.[0-9]*).*")))
> 1
for: 15m
labels:
severity: warning
- alert: KubeClientErrors
annotations:
description: Kubernetes API server client '{{ $labels.job }}/{{ $labels.instance
- }}' is experiencing {{ $value | humanizePercentage }} errors.'
+ }}' is experiencing {{ $value | humanizePercentage }} errors on cluster
+ {{ $labels.cluster }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclienterrors
summary: Kubernetes API server client is experiencing errors.
expr: |-
(sum(rate(rest_client_requests_total{job="apiserver",code=~"5.."}[5m])) by (instance,job,namespace,cluster)
/
sum(rate(rest_client_requests_total{job="apiserver"}[5m])) by (instance,job,namespace,cluster))
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-vmagent
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-vmagent
@@ -15,130 +15,130 @@
interval: 30s
name: vmagent
params: {}
rules:
- alert: PersistentQueueIsDroppingData
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=49&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=49&var-instance={{
+ $labels.instance }}
description: Vmagent dropped {{ $value | humanize1024 }} from persistent queue
on instance {{ $labels.instance }} for the last 10m.
summary: Instance {{ $labels.instance }} is dropping data from persistent
queue
expr: sum(increase(vm_persistentqueue_bytes_dropped_total[5m])) without (path)
> 0
for: 10m
labels:
severity: critical
- alert: RejectedRemoteWriteDataBlocksAreDropped
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=79&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=79&var-instance={{
+ $labels.instance }}
description: Job "{{ $labels.job }}" on instance {{ $labels.instance }} drops
the rejected by remote-write server data blocks. Check the logs to find
the reason for rejects.
summary: Vmagent is dropping data blocks that are rejected by remote storage
expr: sum(increase(vmagent_remotewrite_packets_dropped_total[5m])) without (url)
> 0
for: 15m
labels:
severity: warning
- alert: TooManyScrapeErrors
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=31&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=31&var-instance={{
+ $labels.instance }}
description: Job "{{ $labels.job }}" on instance {{ $labels.instance }} fails
to scrape targets for last 15m
summary: Vmagent fails to scrape one or more targets
expr: increase(vm_promscrape_scrapes_failed_total[5m]) > 0
for: 15m
labels:
severity: warning
- alert: TooManyWriteErrors
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=77&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=77&var-instance={{
+ $labels.instance }}
description: Job "{{ $labels.job }}" on instance {{ $labels.instance }} responds
with errors to write requests for last 15m.
summary: Vmagent responds with too many errors on data ingestion protocols
expr: |-
(sum(increase(vm_ingestserver_request_errors_total[5m])) without (name,net,type)
+
sum(increase(vmagent_http_request_errors_total[5m])) without (path,protocol)) > 0
for: 15m
labels:
severity: warning
- alert: TooManyRemoteWriteErrors
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=61&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=61&var-instance={{
+ $labels.instance }}
description: |-
Vmagent fails to push data via remote write protocol to destination "{{ $labels.url }}"
Ensure that destination is up and reachable.
summary: Job "{{ $labels.job }}" on instance {{ $labels.instance }} fails
to push to remote storage
expr: rate(vmagent_remotewrite_retries_count_total[5m]) > 0
for: 15m
labels:
severity: warning
- alert: RemoteWriteConnectionIsSaturated
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=84&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=84&var-instance={{
+ $labels.instance }}
description: |-
The remote write connection between vmagent "{{ $labels.job }}" (instance {{ $labels.instance }}) and destination "{{ $labels.url }}" is saturated by more than 90% and vmagent won't be able to keep up.
This usually means that `-remoteWrite.queues` command-line flag must be increased in order to increase the number of connections per each remote storage.
summary: Remote write connection from "{{ $labels.job }}" (instance {{ $labels.instance
}}) to {{ $labels.url }} is saturated
expr: "(\n rate(vmagent_remotewrite_send_duration_seconds_total[5m])\n / \n\
\ vmagent_remotewrite_queues\n) > 0.9"
for: 15m
labels:
severity: warning
- alert: PersistentQueueForWritesIsSaturated
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=98&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=98&var-instance={{
+ $labels.instance }}
description: Persistent queue writes for vmagent "{{ $labels.job }}" (instance
{{ $labels.instance }}) are saturated by more than 90% and vmagent won't
be able to keep up with flushing data on disk. In this case, consider to
decrease load on the vmagent or improve the disk throughput.
summary: Persistent queue writes for instance {{ $labels.instance }} are saturated
expr: rate(vm_persistentqueue_write_duration_seconds_total[5m]) > 0.9
for: 15m
labels:
severity: warning
- alert: PersistentQueueForReadsIsSaturated
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=99&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=99&var-instance={{
+ $labels.instance }}
description: Persistent queue reads for vmagent "{{ $labels.job }}" (instance
{{ $labels.instance }}) are saturated by more than 90% and vmagent won't
be able to keep up with reading data from the disk. In this case, consider
to decrease load on the vmagent or improve the disk throughput.
summary: Persistent queue reads for instance {{ $labels.instance }} are saturated
expr: rate(vm_persistentqueue_read_duration_seconds_total[5m]) > 0.9
for: 15m
labels:
severity: warning
- alert: SeriesLimitHourReached
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=88&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=88&var-instance={{
+ $labels.instance }}
description: Max series limit set via -remoteWrite.maxHourlySeries flag is
close to reaching the max value. Then samples for new time series will be
dropped instead of sending them to remote storage systems.
summary: Instance {{ $labels.instance }} reached 90% of the limit
expr: (vmagent_hourly_series_limit_current_series / vmagent_hourly_series_limit_max_series)
> 0.9
labels:
severity: critical
- alert: SeriesLimitDayReached
annotations:
- dashboard: grafana.domain.com/d/G7Z9GzMGz?viewPanel=90&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/G7Z9GzMGz?viewPanel=90&var-instance={{
+ $labels.instance }}
description: Max series limit set via -remoteWrite.maxDailySeries flag is
close to reaching the max value. Then samples for new time series will be
dropped instead of sending them to remote storage systems.
summary: Instance {{ $labels.instance }} reached 90% of the limit
expr: (vmagent_daily_series_limit_current_series / vmagent_daily_series_limit_max_series)
> 0.9
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-vmcluster
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-vmcluster
@@ -15,14 +15,14 @@
interval: 30s
name: vmcluster
params: {}
rules:
- alert: DiskRunsOutOfSpaceIn3Days
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=113&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=113&var-instance={{
+ $labels.instance }}
description: |-
Taking into account current ingestion rate, free disk space will be enough only for {{ $value | humanizeDuration }} on instance {{ $labels.instance }}.
Consider to limit the ingestion rate, decrease retention or scale the disk space up if possible.
summary: Instance {{ $labels.instance }} will run out of disk space in 3 days
expr: |-
sum(vm_free_disk_space_bytes) without(path) /
@@ -34,14 +34,14 @@
) < 3 * 24 * 3600 > 0
for: 30m
labels:
severity: critical
- alert: NodeBecomesReadonlyIn3Days
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=113&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=113&var-instance={{
+ $labels.instance }}
description: |-
Taking into account current ingestion rate, free disk space and -storage.minFreeDiskSpaceBytes instance {{ $labels.instance }} will remain writable for {{ $value | humanizeDuration }}.
Consider to limit the ingestion rate, decrease retention or scale the disk space up if possible.
summary: Instance {{ $labels.instance }} will become read-only in 3 days
expr: |-
sum(vm_free_disk_space_bytes - vm_free_disk_space_limit_bytes) without(path) /
@@ -53,14 +53,14 @@
) < 3 * 24 * 3600 > 0
for: 30m
labels:
severity: warning
- alert: DiskRunsOutOfSpace
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=200&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=200&var-instance={{
+ $labels.instance }}
description: |-
Disk utilisation on instance {{ $labels.instance }} is more than 80%.
Having less than 20% of free disk space could cripple merges processes and overall performance. Consider to limit the ingestion rate, decrease retention or scale the disk space if possible.
summary: Instance {{ $labels.instance }} (job={{ $labels.job }}) will run
out of disk space soon
expr: |-
@@ -71,27 +71,27 @@
) > 0.8
for: 30m
labels:
severity: critical
- alert: RequestErrorsToAPI
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=52&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=52&var-instance={{
+ $labels.instance }}
description: Requests to path {{ $labels.path }} are receiving errors. Please
verify if clients are sending correct requests.
summary: Too many errors served for {{ $labels.job }} path {{ $labels.path
}} (instance {{ $labels.instance }})
expr: increase(vm_http_request_errors_total[5m]) > 0
for: 15m
labels:
severity: warning
show_at: dashboard
- alert: RPCErrors
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=44&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=44&var-instance={{
+ $labels.instance }}
description: |-
RPC errors are interconnection errors between cluster components.
Possible reasons for errors are misconfiguration, overload, network blips or unreachable components.
summary: Too many RPC errors for {{ $labels.job }} (instance {{ $labels.instance
}})
expr: |-
@@ -105,13 +105,13 @@
for: 15m
labels:
severity: warning
show_at: dashboard
- alert: TooHighChurnRate
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=102
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=102
description: |-
VM constantly creates new time series.
This effect is known as Churn Rate.
High Churn Rate tightly connected with database performance and may result in unexpected OOM's or slow queries.
summary: Churn rate is more than 10% for the last 15m
expr: |-
@@ -122,13 +122,13 @@
) > 0.1
for: 15m
labels:
severity: warning
- alert: TooHighChurnRate24h
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=102
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=102
description: |-
The number of created new time series over last 24h is 3x times higher than current number of active series.
This effect is known as Churn Rate.
High Churn Rate tightly connected with database performance and may result in unexpected OOM's or slow queries.
summary: Too high number of new series created over last 24h
expr: |-
@@ -137,13 +137,13 @@
(sum(vm_cache_entries{type="storage/hour_metric_ids"}) by (job,cluster) * 3)
for: 15m
labels:
severity: warning
- alert: TooHighSlowInsertsRate
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=108
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=108
description: High rate of slow inserts may be a sign of resource exhaustion
for the current load. It is likely more RAM is needed for optimal handling
of the current number of active time series. See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183
summary: Percentage of slow inserts is more than 5% for the last 15m
expr: |-
(
@@ -153,14 +153,14 @@
) > 0.05
for: 15m
labels:
severity: warning
- alert: VminsertVmstorageConnectionIsSaturated
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=139&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=139&var-instance={{
+ $labels.instance }}
description: |-
The connection between vminsert (instance {{ $labels.instance }}) and vmstorage (instance {{ $labels.addr }}) is saturated by more than 90% and vminsert won't be able to keep up.
This usually means that more vminsert or vmstorage nodes must be added to the cluster in order to increase the total number of vminsert -> vmstorage links.
summary: Connection between vminsert on {{ $labels.instance }} and vmstorage
on {{ $labels.addr }} is saturated
expr: rate(vm_rpc_send_duration_seconds_total[5m]) > 0.9
--- HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-vmsingle
+++ HelmRelease: monitoring/victoria-metrics VMRule: monitoring/vmetrics-vmsingle
@@ -15,14 +15,14 @@
interval: 30s
name: vmsingle
params: {}
rules:
- alert: DiskRunsOutOfSpaceIn3Days
annotations:
- dashboard: grafana.domain.com/d/wNf0q_kZk?viewPanel=73&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/wNf0q_kZk?viewPanel=73&var-instance={{
+ $labels.instance }}
description: |-
Taking into account current ingestion rate, free disk space will be enough only for {{ $value | humanizeDuration }} on instance {{ $labels.instance }}.
Consider to limit the ingestion rate, decrease retention or scale the disk space if possible.
summary: Instance {{ $labels.instance }} will run out of disk space soon
expr: |-
sum(vm_free_disk_space_bytes) without(path) /
@@ -34,14 +34,14 @@
) < 3 * 24 * 3600 > 0
for: 30m
labels:
severity: critical
- alert: NodeBecomesReadonlyIn3Days
annotations:
- dashboard: grafana.domain.com/d/oS7Bi_0Wz?viewPanel=113&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/oS7Bi_0Wz?viewPanel=113&var-instance={{
+ $labels.instance }}
description: |-
Taking into account current ingestion rate and free disk space instance {{ $labels.instance }} is writable for {{ $value | humanizeDuration }}.
Consider to limit the ingestion rate, decrease retention or scale the disk space up if possible.
summary: Instance {{ $labels.instance }} will become read-only in 3 days
expr: |-
sum(vm_free_disk_space_bytes - vm_free_disk_space_limit_bytes) without(path) /
@@ -53,14 +53,14 @@
) < 3 * 24 * 3600 > 0
for: 30m
labels:
severity: warning
- alert: DiskRunsOutOfSpace
annotations:
- dashboard: grafana.domain.com/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/wNf0q_kZk?viewPanel=53&var-instance={{
+ $labels.instance }}
description: |-
Disk utilisation on instance {{ $labels.instance }} is more than 80%.
Having less than 20% of free disk space could cripple merge processes and overall performance. Consider to limit the ingestion rate, decrease retention or scale the disk space if possible.
summary: Instance {{ $labels.instance }} (job={{ $labels.job }}) will run
out of disk space soon
expr: |-
@@ -71,26 +71,26 @@
) > 0.8
for: 30m
labels:
severity: critical
- alert: RequestErrorsToAPI
annotations:
- dashboard: grafana.domain.com/d/wNf0q_kZk?viewPanel=35&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/wNf0q_kZk?viewPanel=35&var-instance={{
+ $labels.instance }}
description: Requests to path {{ $labels.path }} are receiving errors. Please
verify if clients are sending correct requests.
summary: Too many errors served for path {{ $labels.path }} (instance {{ $labels.instance
}})
expr: increase(vm_http_request_errors_total[5m]) > 0
for: 15m
labels:
severity: warning
- alert: TooHighChurnRate
annotations:
- dashboard: grafana.domain.com/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/wNf0q_kZk?viewPanel=66&var-instance={{
+ $labels.instance }}
description: |-
VM constantly creates new time series on "{{ $labels.instance }}".
This effect is known as Churn Rate.
High Churn Rate tightly connected with database performance and may result in unexpected OOM's or slow queries.
summary: Churn rate is more than 10% on "{{ $labels.instance }}" for the last
15m
@@ -102,14 +102,14 @@
) > 0.1
for: 15m
labels:
severity: warning
- alert: TooHighChurnRate24h
annotations:
- dashboard: grafana.domain.com/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/wNf0q_kZk?viewPanel=66&var-instance={{
+ $labels.instance }}
description: |-
The number of created new time series over last 24h is 3x times higher than current number of active series on "{{ $labels.instance }}".
This effect is known as Churn Rate.
High Churn Rate tightly connected with database performance and may result in unexpected OOM's or slow queries.
summary: Too high number of new series on "{{ $labels.instance }}" created
over last 24h
@@ -119,14 +119,14 @@
(sum(vm_cache_entries{type="storage/hour_metric_ids"}) by (instance,cluster) * 3)
for: 15m
labels:
severity: warning
- alert: TooHighSlowInsertsRate
annotations:
- dashboard: grafana.domain.com/d/wNf0q_kZk?viewPanel=68&var-instance={{ $labels.instance
- }}
+ dashboard: grafana.external.host/d/wNf0q_kZk?viewPanel=68&var-instance={{
+ $labels.instance }}
description: High rate of slow inserts on "{{ $labels.instance }}" may be
a sign of resource exhaustion for the current load. It is likely more RAM
is needed for optimal handling of the current number of active time series.
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183
summary: Percentage of slow inserts is more than 5% on "{{ $labels.instance
}}" for the last 15m
--- HelmRelease: monitoring/victoria-metrics VMSingle: monitoring/vmetrics
+++ HelmRelease: monitoring/victoria-metrics VMSingle: monitoring/vmetrics
@@ -10,13 +10,13 @@
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: victoria-metrics-k8s-stack
spec:
extraArgs:
search.maxUniqueTimeseries: '600000'
image:
- tag: v1.109.1
+ tag: v1.110.0
license: {}
port: '8429'
replicaCount: 1
resources: {}
retentionPeriod: '1'
storage:
--- HelmRelease: monitoring/victoria-metrics ValidatingWebhookConfiguration: monitoring/victoria-metrics-victoria-metrics-operator-admission
+++ HelmRelease: monitoring/victoria-metrics ValidatingWebhookConfiguration: monitoring/victoria-metrics-victoria-metrics-operator-admission
@@ -11,13 +11,13 @@
- clientConfig:
service:
namespace: monitoring
name: victoria-metrics-victoria-metrics-operator
path: /validate-operator-victoriametrics-com-v1beta1-vlogs
port: 9443
- caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURIakNDQWdhZ0F3SUJBZ0lSQUovUHRkb2Y5dTZvRUZGSk1TTXhNRUF3RFFZSktvWklodmNOQVFFTEJRQXcKR1RFWE1CVUdBMVVFQXhNT2RtMHRiM0JsY21GMGIzSXRZMkV3SGhjTk1qVXdNVEkzTVRJMU1EQXlXaGNOTXpVdwpNVEkxTVRJMU1EQXlXakFaTVJjd0ZRWURWUVFERXc1MmJTMXZjR1Z5WVhSdmNpMWpZVENDQVNJd0RRWUpLb1pJCmh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTXBQUmV4Rm1xWHJDeGI5N3dNV0w5WkkxMEx0MStzZjloWnkKVFlmMVAyelZRVXduOGtVUjQ1Wnc0dndSL285ajJPS3pHY1NwRW5ETm5MbmVQYU9wYjlVNzRLRXdFaDYrZFp0cwpGNnJ2NGlRajh1anNablBqSDJmMllZbUxxbVV6M0FpVzIwUms5NnQ5SVNWNUU2NGpzNmFKYXZtb1lXeitsWTdZCkZtOW0xR0ZzT1lHdUxHSTNONDBWRGpSc1o1a1FMZnY2eHJXZ3l3bjh4N2I2SkhOUURlaUYrcWx1Vy84QXIzMVoKQ0IrZjlDbnlqVVdob1A5cVNHVXRQRU9XbnFCVzd5czFNSFJ3dGhQYmJ2NTJGWG1YWlVNS290OGhzRVo1dit2dAowN3hqYXQ0M3ZKTjd3Q0taOXFocjIzMlhPYmdmeTYrY1o5eE93UDhOaGcyOGxiTkpudmNDQXdFQUFhTmhNRjh3CkRnWURWUjBQQVFIL0JBUURBZ0trTUIwR0ExVWRKUVFXTUJRR0NDc0dBUVVGQndNQkJnZ3JCZ0VGQlFjREFqQVAKQmdOVkhSTUJBZjhFQlRBREFRSC9NQjBHQTFVZERnUVdCQlE0R3NRemtPa2RYM3dPOGJ2UDRzaUxyMk5tSnpBTgpCZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFDdWFSczAxNGdLUzY0dno2RTR6eklXRTBhU01hbTVPZ2tRMFhDLzdWCmE1TUpvVmlhQmVXUnZuTXJZK0JYOVVmN1pYbG96eFplVHZQbUlFdnVQODhWcnQzdWVDT1NtOC9jQmxDYm1WYUwKUXA0TTZzY1JsZktWL3NqQzdTSElsVHZOZ09MZjN5VW1JWHY4eW0vNjNhRnJBUXQ2elRtM2VLSWNhZXBDM2ppVwpsaHVUaGRRTm5tSFJMSEQ1bjcxTk9tUmRpdjBiNDJYMEdqSVZDUUhnQmRhRXFra2hYK1cwY3BqMFU0ZUdpMHNzCnBZMlhRODJIUzE2cFNhTi9uNklYR3R0S3FRb0tjTGxIVHhCMWdlaDBEMUlORmp4K2NPS2h1OHF1UXRMV0hKdTgKcFBBUmdqQ0VKUWZRL3lNbFNZQ2lnaHN6NEdPUHZtTno1UDBSNzBDZnRwK0xFQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
+ caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURIVENDQWdXZ0F3SUJBZ0lRT3BsaFFuRTd5ZE5uMmc2dnpTU3JnakFOQmdrcWhraUc5dzBCQVFzRkFEQVoKTVJjd0ZRWURWUVFERXc1MmJTMXZjR1Z5WVhSdmNpMWpZVEFlRncweU5UQXhNamN4TWpVd01ESmFGdzB6TlRBeApNalV4TWpVd01ESmFNQmt4RnpBVkJnTlZCQU1URG5adExXOXdaWEpoZEc5eUxXTmhNSUlCSWpBTkJna3Foa2lHCjl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUEzS1FaZmcxbzlZbXpIMU9jWklZRXdGRTBvY1ZxamtNc0xOa2sKaFMrNmtGS0pEVlBVdlROVDJEekphS3hGQklRVFJ2bi9KSFRCT1NQVUl1WUxKcFllaEFCaHpCRXArMkV6MkNaNQp6NTJOMXhmTVBjd1ovY0ZoQzVFalhqYjhvQ0tndXc3NXo0S3Qzem00UUw0WDJicHE3eVRkclZERFNlNElDb2tPCkpaMUNCNjdRRnNlcXZpSnNaY2dibnFDUjBJMkhTNGk1bnkyKzJneHBFckc3emNacEdZU1I1QnhuZ2UzS2t6WnAKeDAybzAzR2VaUytOZERXMDZWN0lka1I3Vmh6RlNPVU0yNUpxc0xZYmYvRUNxM2pwaVB1THVuZm5MSHl0Umd6TApGTmJIQzJnTVVzMnBQY0IwbzNOSkNCMGgwd0NVNk9MOFMwRlZNU3d4L0E1TytXdFNGUUlEQVFBQm8yRXdYekFPCkJnTlZIUThCQWY4RUJBTUNBcVF3SFFZRFZSMGxCQll3RkFZSUt3WUJCUVVIQXdFR0NDc0dBUVVGQndNQ01BOEcKQTFVZEV3RUIvd1FGTUFNQkFmOHdIUVlEVlIwT0JCWUVGUFZCZE5YbjFwTEdYS0NhZGZpREJJNXVIZmthTUEwRwpDU3FHU0liM0RRRUJDd1VBQTRJQkFRQjZPZHBJcXVsZ0ZDeGN3Z2FFZVBuYW85b09RZ2pjaGxXb2pZcitITFQzCmJZUkFUTkx0QWZJMm5lcW94VWs4VVB6M0c3OW1KNSswZzFSK2gxV3dQYW1BaW9QWjhXTTJrc21TOW45OFY3bkMKY3VXYzBNQ05zWU9DY3RMeEZCdmlaL1RENkxVbEgrR0xUWk9YdXdoalNubXk2Mm9WMnBEa255NHhjRTFWMXV2Wgpwc0pDRzIrcmpUb2dMY3A4OVBjZjFiVVRXVWs4Y2phM2hDZW5YNHdURHMzVk8vSVI5aXM2N2dMdmh0SzZNTFZrCjNlanByckdGOTM5QWk2SUxmejBVSi9zN01zVzNZb1padURueVA4VmtmV00wM09vYWY0MTZHaEpVMGZqZ1VqTjkKQUlPcVhsVCtBQXBwcWhneWR4dGE3RlQ3MVVJRWVrSEw0RVhwdHRUV0lJSzUKLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
failurePolicy: Fail
name: vlogs.victoriametrics.com
admissionReviewVersions:
- v1
- v1beta1
sideEffects: None
@@ -40,13 +40,13 @@
- clientConfig:
service:
namespace: monitoring
name: victoria-metrics-victoria-metrics-operator
path: /validate-operator-victoriametrics-com-v1beta1-vmagent
port: 9443
- caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURIakNDQWdhZ0F3SUJBZ0lSQUovUHRkb2Y5dTZvRUZGSk1TTXhNRUF3RFFZSktvWklodmNOQVFFTEJRQXcKR1RFWE1CVUdBMVVFQXhNT2RtMHRiM0JsY21GMGIzSXRZMkV3SGhjTk1qVXdNVEkzTVRJMU1EQXlXaGNOTXpVdwpNVEkxTVRJMU1EQXlXakFaTVJjd0ZRWURWUVFERXc1MmJTMXZjR1Z5WVhSdmNpMWpZVENDQVNJd0RRWUpLb1pJCmh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTXBQUmV4Rm1xWHJDeGI5N3dNV0w5WkkxMEx0MStzZjloWnkKVFlmMVAyelZRVXduOGtVUjQ1Wnc0dndSL285ajJPS3pHY1NwRW5ETm5MbmVQYU9wYjlVNzRLRXdFaDYrZFp0cwpGNnJ2NGlRajh1anNablBqSDJmMllZbUxxbVV6M0FpVzIwUms5NnQ5SVNWNUU2NGpzNmFKYXZtb1lXeitsWTdZCkZtOW0xR0ZzT1lHdUxHSTNONDBWRGpSc1o1a1FMZnY2eHJXZ3l3bjh4N2I2SkhOUURlaUYrcWx1Vy84QXIzMVoKQ0IrZjlDbnlqVVdob1A5cVNHVXRQRU9XbnFCVzd5czFNSFJ3dGhQYmJ2NTJGWG1YWlVNS290OGhzRVo1dit2dAowN3hqYXQ0M3ZKTjd3Q0taOXFocjIzMlhPYmdmeTYrY1o5eE93UDhOaGcyOGxiTkpudmNDQXdFQUFhTmhNRjh3CkRnWURWUjBQQVFIL0JBUURBZ0trTUIwR0ExVWRKUVFXTUJRR0NDc0dBUVVGQndNQkJnZ3JCZ0VGQlFjREFqQVAKQmdOVkhSTUJBZjhFQlRBREFRSC9NQjBHQTFVZERnUVdCQlE0R3NRemtPa2RYM3dPOGJ2UDRzaUxyMk5tSnpBTgpCZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFDdWFSczAxNGdLUzY0dno2RTR6eklXRTBhU01hbTVPZ2tRMFhDLzdWCmE1TUpvVmlhQmVXUnZuTXJZK0JYOVVmN1pYbG96eFplVHZQbUlFdnVQODhWcnQzdWVDT1NtOC9jQmxDYm1WYUwKUXA0TTZzY1JsZktWL3NqQzdTSElsVHZOZ09MZjN5VW1JWHY4eW0vNjNhRnJBUXQ2elRtM2VLSWNhZXBDM2ppVwpsaHVUaGRRTm5tSFJMSEQ1bjcxTk9tUmRpdjBiNDJYMEdqSVZDUUhnQmRhRXFra2hYK1cwY3BqMFU0ZUdpMHNzCnBZMlhRODJIUzE2cFNhTi9uNklYR3R0S3FRb0tjTGxIVHhCMWdlaDBEMUlORmp4K2NPS2h1OHF1UXRMV0hKdTgKcFBBUmdqQ0VKUWZRL3lNbFNZQ2lnaHN6NEdPUHZtTno1UDBSNzBDZnRwK0xFQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
+ caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURIVENDQWdXZ0F3SUJBZ0lRT3BsaFFuRTd5ZE5uMmc2dnpTU3JnakFOQmdrcWhraUc5dzBCQVFzRkFEQVoKTVJjd0ZRWURWUVFERXc1MmJTMXZjR1Z5WVhSdmNpMWpZVEFlRncweU5UQXhNamN4TWpVd01ESmFGdzB6TlRBeApNalV4TWpVd01ESmFNQmt4RnpBVkJnTlZCQU1URG5adExXOXdaWEpoZEc5eUxXTmhNSUlCSWpBTkJna3Foa2lHCjl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUEzS1FaZmcxbzlZbXpIMU9jWklZRXdGRTBvY1ZxamtNc0xOa2sKaFMrNmtGS0pEVlBVdlROVDJEekphS3hGQklRVFJ2bi9KSFRCT1NQVUl1WUxKcFllaEFCaHpCRXArMkV6MkNaNQp6NTJOMXhmTVBjd1ovY0ZoQzVFalhqYjhvQ0tndXc3NXo0S3Qzem00UUw0WDJicHE3eVRkclZERFNlNElDb2tPCkpaMUNCNjdRRnNlcXZpSnNaY2dibnFDUjBJMkhTNGk1bnkyKzJneHBFckc3emNacEdZU1I1QnhuZ2UzS2t6WnAKeDAybzAzR2VaUytOZERXMDZWN0lka1I3Vmh6RlNPVU0yNUpxc0xZYmYvRUNxM2pwaVB1THVuZm5MSHl0Umd6TApGTmJIQzJnTVVzMnBQY0IwbzNOSkNCMGgwd0NVNk9MOFMwRlZNU3d4L0E1TytXdFNGUUlEQVFBQm8yRXdYekFPCkJnTlZIUThCQWY4RUJBTUNBcVF3SFFZRFZSMGxCQll3RkFZSUt3WUJCUVVIQXdFR0NDc0dBUVVGQndNQ01BOEcKQTFVZEV3RUIvd1FGTUFNQkFmOHdIUVlEVlIwT0JCWUVGUFZCZE5YbjFwTEdYS0NhZGZpREJJNXVIZmthTUEwRwpDU3FHU0liM0RRRUJDd1VBQTRJQkFRQjZPZHBJcXVsZ0ZDeGN3Z2FFZVBuYW85b09RZ2pjaGxXb2pZcitITFQzCmJZUkFUTkx0QWZJMm5lcW94VWs4VVB6M0c3OW1KNSswZzFSK2gxV3dQYW1BaW9QWjhXTTJrc21TOW45OFY3bkMKY3VXYzBNQ05zWU9DY3RMeEZCdmlaL1RENkxVbEgrR0xUWk9YdXdoalNubXk2Mm9WMnBEa255NHhjRTFWMXV2Wgpwc0pDRzIrcmpUb2dMY3A4OVBjZjFiVVRXVWs4Y2phM2hDZW5YNHdURHMzVk8vSVI5aXM2N2dMdmh0SzZNTFZrCjNlanByckdGOTM5QWk2SUxmejBVSi9zN01zVzNZb1padURueVA4VmtmV00wM09vYWY0MTZHaEpVMGZqZ1VqTjkKQUlPcVhsVCtBQXBwcWhneWR4dGE3RlQ3MVVJRWVrSEw0RVhwdHRUV0lJSzUKLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
failurePolicy: Fail
name: vmagent.victoriametrics.com
admissionReviewVersions:
- v1
- v1beta1
sideEffects: None
@@ -69,13 +69,13 @@
- clientConfig:
service:
namespace: monitoring
name: victoria-metrics-victoria-metrics-operator
path: /validate-operator-victoriametrics-com-v1beta1-vmalert
port: 9443
- caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURIakNDQWdhZ0F3SUJBZ0lSQUovUHRkb2Y5dTZvRUZGSk1TTXhNRUF3RFFZSktvWklodmNOQVFFTEJRQXcKR1RFWE1CVUdBMVVFQXhNT2RtMHRiM0JsY21GMGIzSXRZMkV3SGhjTk1qVXdNVEkzTVRJMU1EQXlXaGNOTXpVdwpNVEkxTVRJMU1EQXlXakFaTVJjd0ZRWURWUVFERXc1MmJTMXZjR1Z5WVhSdmNpMWpZVENDQVNJd0RRWUpLb1pJCmh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTXBQUmV4Rm1xWHJDeGI5N3dNV0w5WkkxMEx0MStzZjloWnkKVFlmMVAyelZRVXduOGtVUjQ1Wnc0dndSL285ajJPS3pHY1NwRW5ETm5MbmVQYU9wYjlVNzRLRXdFaDYrZFp0cwpGNnJ2NGlRajh1anNablBqSDJmMllZbUxxbVV6M0FpVzIwUms5NnQ5SVNWNUU2NGpzNmFKYXZtb1lXeitsWTdZCkZtOW0xR0ZzT1lHdUxHSTNONDBWRGpSc1o1a1FMZnY2eHJXZ3l3bjh4N2I2SkhOUURlaUYrcWx1Vy84QXIzMVoKQ0IrZjlDbnlqVVdob1A5cVNHVXRQRU9XbnFCVzd5czFNSFJ3dGhQYmJ2NTJGWG1YWlVNS290OGhzRVo1dit2dAowN3hqYXQ0M3ZKTjd3Q0taOXFocjIzMlhPYmdmeTYrY1o5eE93UDhOaGcyOGxiTkpudmNDQXdFQUFhTmhNRjh3CkRnWURWUjBQQVFIL0JBUURBZ0trTUIwR0ExVWRKUVFXTUJRR0NDc0dBUVVGQndNQkJnZ3JCZ0VGQlFjREFqQVAKQmdOVkhSTUJBZjhFQlRBREFRSC9NQjBHQTFVZERnUVdCQlE0R3NRemtPa2RYM3dPOGJ2UDRzaUxyMk5tSnpBTgpCZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFDdWFSczAxNGdLUzY0dno2RTR6eklXRTBhU01hbTVPZ2tRMFhDLzdWCmE1TUpvVmlhQmVXUnZuTXJZK0JYOVVmN1pYbG96eFplVHZQbUlFdnVQODhWcnQzdWVDT1NtOC9jQmxDYm1WYUwKUXA0TTZzY1JsZktWL3NqQzdTSElsVHZOZ09MZjN5VW1JWHY4eW0vNjNhRnJBUXQ2elRtM2VLSWNhZXBDM2ppVwpsaHVUaGRRTm5tSFJMSEQ1bjcxTk9tUmRpdjBiNDJYMEdqSVZDUUhnQmRhRXFra2hYK1cwY3BqMFU0ZUdpMHNzCnBZMlhRODJIUzE2cFNhTi9uNklYR3R0S3FRb0tjTGxIVHhCMWdlaDBEMUlORmp4K2NPS2h1OHF1UXRMV0hKdTgKcFBBUmdqQ0VKUWZRL3lNbFNZQ2lnaHN6NEdPUHZtTno1UDBSNzBDZnRwK0xFQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
[Diff truncated by flux-local] |
bo0tzz
approved these changes
Jan 27, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
0.34.0
->0.35.0
Release Notes
VictoriaMetrics/helm-charts (victoria-metrics-k8s-stack)
v0.35.0
Compare Source
Release notes for version 0.35.0
Release date: 27 Jan 2025
Update note: This release contains breaking change.
.Values.externalVM
was renamed to.Values.external.vm
.Values.external.grafana.host
to configure grafana host for alerts, when.Values.grafana.enabled: false
.Values.externalVM
to.Values.external.vm
for consistencyConfiguration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.