Skip to content

Commit c6dd3f4

Browse files
add alert KubeNodeAwaitingWorkloadEvacuationBeforeShutdown
Signed-off-by: Yaroslav Borbat <[email protected]>
1 parent 9b23322 commit c6dd3f4

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
- name: virtualization.vm.node
2+
rules:
3+
- alert: KubeNodeAwaitingVirtualMachinesEvictionBeforeShutdown
4+
expr: |
5+
(
6+
kube_node_status_condition{condition="GracefulShutdownPostpone", status="true"} == 1
7+
and on(node)
8+
sum by (node) (d8_virtualization_virtualmachine_status_phase{phase="Running"}) > 0
9+
)
10+
labels:
11+
severity_level: "6"
12+
tier: cluster
13+
for: 5m
14+
annotations:
15+
plk_protocol_extent_version: "1"
16+
plk_markup_format: "markdown"
17+
plk_create_group_if_not_exists__node_maintenance: "NodeMaintenance,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
18+
plk_grouped_by__node_maintenance: "NodeMaintenance,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes"
19+
summary: Node is awaiting workload evacuation before safe shutdown.
20+
description: |-
21+
The node `{{ $labels.node }}` has activated graceful shutdown protection and **cannot be safely powered off** until workloads (e.g., VirtualMachines) are eviction.
22+
23+
### What Is Happening?
24+
A shutdown request was issued, but the system intercepted it to prevent data loss or VM downtime.
25+
The `GracefulShutdownPostpone` condition is now active — this means:
26+
- The node is **intentionally blocking abrupt power-off**.
27+
- You must **manually evict VirtualMachines** before proceeding.
28+
29+
This is expected behavior for nodes running VMs and ensures safe maintenance.
30+
31+
### Required Action
32+
To proceed with node shutdown:
33+
1. **List VMs running on the node and check if they are migratable**:
34+
```bash
35+
d8 k get virtualmachine -A -o jsonpath='{range .items[?(@.status.nodeName=="{{ $labels.node }}")]}{.metadata.namespace}/{.metadata.name}{"\t"}Migratable={.status.conditions[?(@.type=="Migratable")].status}{"\n"}{end}'
36+
```
37+
This command shows a list like:
38+
```bash
39+
default/vm-name Migratable=True
40+
prod/vm-beta Migratable=False
41+
```
42+
2. **For each VM**:
43+
**If Migratable=True**, **migrate the VM to another node**:
44+
```bash
45+
d8 v evict <vm-name> -n <namespace>
46+
```
47+
> This migrates the VM to another node without guest OS downtime.
48+
49+
**If Migratable=False**, **restart the VM**:
50+
```bash
51+
d8 v restart <vm-name> -n <namespace>
52+
```
53+
> This restarts the VM.
54+
Some VMs cannot run on other nodes because they have specific storage or network requirements.
55+
In such cases, these VMs must be stopped.
56+
57+
3. Once all VMs are migrated, restarted or stopped, the node will automatically continue shutting down.

0 commit comments

Comments
 (0)