Skip to content

Commit 0f9f9dd

Browse files
feat(module): add alert KubeNodeAwaitingVirtualMachinesEvictionBeforeShutdown (#1268)
add alert KubeNodeAwaitingWorkloadEvacuationBeforeShutdown Signed-off-by: Yaroslav Borbat <[email protected]>
1 parent 2fc0b9d commit 0f9f9dd

File tree

1 file changed

+62
-0
lines changed

1 file changed

+62
-0
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
- name: virtualization.vm.node
2+
rules:
3+
- alert: KubeNodeAwaitingVirtualMachinesEvictionBeforeShutdown
4+
expr: |
5+
(
6+
kube_node_status_condition{condition="GracefulShutdownPostpone", status="true"} == 1
7+
and on(node)
8+
sum by (node) (d8_virtualization_virtualmachine_status_phase{phase="Running"}) > 0
9+
)
10+
labels:
11+
severity_level: "6"
12+
tier: cluster
13+
for: 5m
14+
annotations:
15+
plk_protocol_extent_version: "1"
16+
plk_markup_format: "markdown"
17+
plk_create_group_if_not_exists__node_maintenance: "NodeMaintenance,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
18+
plk_grouped_by__node_maintenance: "NodeMaintenance,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
19+
summary: Node is awaiting workload evacuation before safe shutdown.
20+
description: |
21+
The node `{{ $labels.node }}` has activated graceful shutdown
22+
protection and **cannot be safely powered off** until workloads (e.g.,
23+
VirtualMachines) are eviction.
24+
25+
### What Is Happening?
26+
27+
A shutdown request was issued, but the system intercepted it to prevent data loss or VM downtime.
28+
The `GracefulShutdownPostpone` condition is now active — this means:
29+
- The node is **intentionally blocking abrupt power-off**.
30+
- You must **manually evict VirtualMachines** before proceeding.
31+
32+
This is expected behavior for nodes running VMs and ensures safe maintenance.
33+
34+
### Required Action
35+
36+
To proceed with node shutdown:
37+
38+
1. **List VMs running on the node and check if they are migratable**:
39+
```bash
40+
d8 k get virtualmachine -A -o jsonpath="{range .items[?(@.status.nodeName==\"{{ $labels.node }}\")]}{.metadata.namespace}/{.metadata.name}{\"\t\"}Migratable={.status.conditions[?(@.type==\"Migratable\")].status}{\"\n\"}{end}"
41+
```
42+
This command shows a list like:
43+
```bash
44+
default/vm-name Migratable=True
45+
prod/vm-beta Migratable=False
46+
```
47+
2. **For each VM**:
48+
**If Migratable=True**, **migrate the VM to another node**:
49+
```bash
50+
d8 v evict <vm-name> -n <namespace>
51+
```
52+
> This migrates the VM to another node without guest OS downtime.
53+
54+
**If Migratable=False**, **restart the VM**:
55+
```bash
56+
d8 v restart <vm-name> -n <namespace>
57+
```
58+
> This restarts the VM.
59+
Some VMs cannot run on other nodes because they have specific storage or network requirements.
60+
In such cases, these VMs must be stopped.
61+
62+
3. Once all VMs are migrated, restarted or stopped, the node will automatically continue shutting down.

0 commit comments

Comments
 (0)