Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
- name: kubernetes.internal.virtualization.cdi_apiserver_state
rules:
- alert: D8InternalVirtualizationCDIAPIServerPodIsNotReady
expr: min by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"cdi-apiserver-.*"}) != 1
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"cdi-apiserver-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase!~"Succeeded", pod=~"cdi-apiserver-.*"} == 1)
labels:
severity_level: "6"
tier: cluster
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
- name: kubernetes.internal.virtualization.cdi_deployment_state
rules:
- alert: D8InternalVirtualizationCDIDeploymentPodIsNotReady
expr: min by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"cdi-deployment-.*"}) != 1
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"cdi-deployment-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"cdi-deployment-.*"} == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The alert triggers only if exactly one pod matches the phase condition, which may miss cases with multiple pods.

Consider replacing '== 1' with '> 0' or a suitable aggregation to handle scenarios with multiple pods in the specified phases.

Suggested change
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"cdi-deployment-.*"} == 1)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"cdi-deployment-.*"} > 0)

labels:
severity_level: "6"
tier: cluster
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
- name: kubernetes.internal.virtualization.cdi_operator_state
rules:
- alert: D8InternalVirtualizationCDIOperatorPodIsNotReady
expr: min by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"cdi-operator-.*"}) != 1
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"cdi-operator-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"cdi-operator-.*"} == 1)
Comment on lines +4 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider using consistent aggregation for pod readiness and phase checks.

Aggregating over pods will ensure the alert works correctly when multiple pods match the pattern, preventing missed readiness or phase issues.

Suggested change
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"cdi-operator-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"cdi-operator-.*"} == 1)
expr: |
(sum by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"cdi-operator-.*"}) == 0)
and on (namespace,pod)
(sum by (pod) (kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"cdi-operator-.*"}) == 1)

labels:
severity_level: "6"
tier: cluster
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
- name: kubernetes.internal.virtualization.virt_api_state
rules:
- alert: D8InternalVirtualizationVirtAPIPodIsNotReady
expr: min by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-api-.*"}) != 1
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-api-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virt-api-.*"} == 1)
Comment on lines +4 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The alert logic may not handle multiple pods correctly due to equality checks.

Consider replacing strict equality checks with aggregation or range-based logic to ensure the alert accurately reflects pod readiness in deployments with multiple pods.

Suggested change
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-api-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virt-api-.*"} == 1)
expr: |
(
sum(
kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-api-.*"}
) by (namespace)
== 0
)
and
(
sum(
kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virt-api-.*"}
) by (namespace)
> 0
)

labels:
severity_level: "6"
tier: cluster
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
- name: kubernetes.internal.virtualization.virt_controller_state
rules:
- alert: D8InternalVirtualizationVirtControllerPodIsNotReady
expr: min by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-controller-.*"}) != 1
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-controller-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virt-controller-.*"} == 1)
Comment on lines +4 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Equality checks for pod readiness and phase may not be robust for deployments with multiple pods.

Consider updating the logic to aggregate pod readiness, ensuring the alert triggers if any pod is not ready, rather than relying on equality checks.

Suggested change
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-controller-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virt-controller-.*"} == 1)
expr: |
(
sum by (namespace, pod) (kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virt-controller-.*"})
>
sum by (namespace, pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-controller-.*"})
)

labels:
severity_level: "6"
tier: cluster
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
- name: kubernetes.internal.virtualization.virt_operator_state
rules:
- alert: D8InternalVirtualizationVirtOperatorPodIsNotReady
expr: min by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-operator-.*"}) != 1
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virt-operator-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virt-operator-.*"} == 1)
labels:
severity_level: "6"
tier: cluster
Expand Down
5 changes: 4 additions & 1 deletion monitoring/prometheus-rules/virtualization-api.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
- name: kubernetes.virtualization.api_state
rules:
- alert: D8VirtualizationAPIPodIsNotReady
expr: min by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virtualization-api-.*"}) != 1
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virtualization-api-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virtualization-api-.*"} == 1)
labels:
severity_level: "6"
tier: cluster
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,10 @@
2. Or check the Pod logs: `kubectl -n d8-virtualization logs deploy/virtualization-controller`

- alert: D8VirtualizationControllerPodIsNotReady
expr: min by (pod) (kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virtualization-controller-.*"}) != 1
expr: |
(kube_pod_status_ready{condition="true", namespace="d8-virtualization", pod=~"virtualization-controller-.*"} == 0)
and on (namespace,pod)
(kube_pod_status_phase{namespace="d8-virtualization", phase=~"Running|Succeeded", pod=~"virtualization-controller-.*"} == 1)
labels:
severity_level: "6"
tier: cluster
Expand Down
Loading