|
| 1 | +- name: virtualization.vm.node |
| 2 | + rules: |
| 3 | + - alert: KubeNodeAwaitingVirtualMachinesEvictionBeforeShutdown |
| 4 | + expr: | |
| 5 | + ( |
| 6 | + kube_node_status_condition{condition="GracefulShutdownPostpone", status="true"} == 1 |
| 7 | + and on(node) |
| 8 | + sum by (node) (d8_virtualization_virtualmachine_status_phase{phase="Running"}) > 0 |
| 9 | + ) |
| 10 | + labels: |
| 11 | + severity_level: "6" |
| 12 | + tier: cluster |
| 13 | + for: 5m |
| 14 | + annotations: |
| 15 | + plk_protocol_extent_version: "1" |
| 16 | + plk_markup_format: "markdown" |
| 17 | + plk_create_group_if_not_exists__node_maintenance: "NodeMaintenance,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes" |
| 18 | + plk_grouped_by__node_maintenance: "NodeMaintenance,tier=~tier,prometheus=deckhouse,kubernetes=~kubernetes" |
| 19 | + summary: Node is awaiting workload evacuation before safe shutdown. |
| 20 | + description: |- |
| 21 | + The node `{{ $labels.node }}` has activated graceful shutdown protection and **cannot be safely powered off** until workloads (e.g., VirtualMachines) are eviction. |
| 22 | +
|
| 23 | + ### What Is Happening? |
| 24 | + A shutdown request was issued, but the system intercepted it to prevent data loss or VM downtime. |
| 25 | + The `GracefulShutdownPostpone` condition is now active — this means: |
| 26 | + - The node is **intentionally blocking abrupt power-off**. |
| 27 | + - You must **manually evict VirtualMachines** before proceeding. |
| 28 | +
|
| 29 | + This is expected behavior for nodes running VMs and ensures safe maintenance. |
| 30 | +
|
| 31 | + ### Required Action |
| 32 | + To proceed with node shutdown: |
| 33 | + 1. **List VMs running on the node and check if they are migratable**: |
| 34 | + ```bash |
| 35 | + d8 k get virtualmachine -A -o jsonpath='{range .items[?(@.status.nodeName=="{{ $labels.node }}")]}{.metadata.namespace}/{.metadata.name}{"\t"}Migratable={.status.conditions[?(@.type=="Migratable")].status}{"\n"}{end}' |
| 36 | + ``` |
| 37 | + This command shows a list like: |
| 38 | + ```bash |
| 39 | + default/vm-name Migratable=True |
| 40 | + prod/vm-beta Migratable=False |
| 41 | + ``` |
| 42 | + 2. **For each VM**: |
| 43 | + **If Migratable=True**, **migrate the VM to another node**: |
| 44 | + ```bash |
| 45 | + d8 v evict <vm-name> -n <namespace> |
| 46 | + ``` |
| 47 | + > This migrates the VM to another node without guest OS downtime. |
| 48 | +
|
| 49 | + **If Migratable=False**, **restart the VM**: |
| 50 | + ```bash |
| 51 | + d8 v restart <vm-name> -n <namespace> |
| 52 | + ``` |
| 53 | + > This restarts the VM. |
| 54 | + Some VMs cannot run on other nodes because they have specific storage or network requirements. |
| 55 | + In such cases, these VMs must be stopped. |
| 56 | +
|
| 57 | + 3. Once all VMs are migrated, restarted or stopped, the node will automatically continue shutting down. |
0 commit comments