percona · patrickbirch · Mar 16, 2026
diff --git a/docs/_static/crash-scenario-1.jpeg b/docs/_static/crash-scenario-1.jpeg
diff --git a/docs/_static/scenario-1.png b/docs/_static/scenario-1.png
diff --git a/docs/_static/scenario-2.png b/docs/_static/scenario-2.png
diff --git a/docs/_static/scenario-3.png b/docs/_static/scenario-3.png
diff --git a/docs/_static/scenario-4.png b/docs/_static/scenario-4.png
diff --git a/docs/_static/scenario-5.png b/docs/_static/scenario-5.png
diff --git a/docs/_static/scenario-6.png b/docs/_static/scenario-6.png
diff --git a/docs/_static/scenario-7.png b/docs/_static/scenario-7.png
diff --git a/docs/crash-recovery.md b/docs/crash-recovery.md
diff --git a/docs/emergency-quorum-recovery.md b/docs/emergency-quorum-recovery.md
@@ -0,0 +1,27 @@
+# Emergency quorum recovery (when nodes are up but traffic is blocked)
+
+Nodes are running and accept connections, but MySQL refuses writes (and often reads) with errors such as `WSREP has not yet prepared node for application use`. The cluster has lost quorum or the *primary component*.
+
+Why this happens: Each node has one vote; the cluster needs a majority to form a *primary component*. Only a primary accepts SQL. If one node leaves cleanly, votes decrease and the remaining nodes can form a primary—but if a node crashes (power loss, kill, kernel panic), the others still expect its vote until they time out. In a 3-node cluster, two nodes left after a crash often cannot form quorum (they expect three votes). Likewise, with two nodes left after a planned restart, a network flicker between those two can cause both to drop to non-primary; the cluster is then "online" but refuses every query.
+
+Recovery options:
+
+1. Restore connectivity so the remaining nodes can see each other and re-form a primary.
+2. Bring the missing node(s) back so the cluster can form quorum again.
+3. Emergency override (when you have confirmed the other nodes are really down): force one node to form a new primary so it can serve traffic; then start the other nodes so they rejoin.
+
+## Emergency override: force a primary when traffic is blocked
+
+Run the following on one node that is still up (connected as a user with sufficient privileges):
+
+```sql
+SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';
+```
+
+The [`pc.bootstrap`](wsrep-provider-index.md#pcbootstrap) option makes that node form a new primary component. After the command runs, the node accepts writes; the cluster is effectively that one node until the others are started and rejoin (via IST or SST). Then start the other nodes so they join this primary.
+
+!!! warning "Only when the other nodes are down"
+
+    Run this override only when you have confirmed that the other nodes are actually down or unreachable. If another node is still primary elsewhere (for example, in another datacenter after a split), setting `pc.bootstrap=YES` on a second node creates two separate clusters with diverging data (split-brain). See [Scenario 5: Two nodes disappear from the cluster](crash-recovery.md#scenario-5-two-nodes-disappear-from-the-cluster) and [Scenario 7: Split brain](crash-recovery.md#scenario-7-the-cluster-loses-its-primary-state-due-to-split-brain) in Crash recovery.
+
+For more support options, see [Get help from Percona](get-help.md).
diff --git a/docs/environmental-blockers.md b/docs/environmental-blockers.md
@@ -0,0 +1,66 @@
+# Environmental blockers (AppArmor, systemd, firewalls)
+
+These often prevent nodes from joining and do not show up in the MySQL error log. If a node refuses to join, work through this section. See [Restart the cluster nodes](restarting-nodes.md) for the full recovery index.
+
+## Security context (AppArmor, SELinux): the "SST jail"
+
+On Ubuntu, the default AppArmor profile does not know about XtraBackup or socat and can block the donor from sending data and the joiner from running the SST method. The failure is often silent. Without fixing this, no amount of PID file removal or grastate.dat editing will make the node join.
+
+Option A — Temporary bridge (confirm AppArmor is the cause): Put both profiles in complain mode and restart MySQL on the joiner. If the node joins, add proper rules and put the profiles back in enforce mode. Run as root or with `sudo`:
+
+```shell
+aa-complain /usr/sbin/mysqld
+aa-complain /usr/bin/wsrep_sst_xtrabackup-v2
+systemctl restart mysql
+```
+
+Option B — Keep enforce mode, allow SST: Add rules to allow the SST script to run xtrabackup and socat. Add to both `/etc/apparmor.d/usr.sbin.mysqld` and `/etc/apparmor.d/usr.bin.wsrep_sst_xtrabackup-v2` (adjust paths for your datadir and binaries):
+
+```text
+  /usr/bin/xtrabackup rix,
+  /usr/bin/xbstream rix,
+  /usr/bin/socat rix,
+  /var/lib/mysql/ r,
+  /var/lib/mysql/** rwk,
+```
+
+Then `systemctl reload apparmor` and `systemctl start mysql`. If the node joined after Option A, add rules like Option B (see [Enable AppArmor](apparmor.md)), then:
+
+```shell
+aa-enforce /usr/sbin/mysqld
+aa-enforce /usr/bin/wsrep_sst_xtrabackup-v2
+```
+
+SELinux (RHEL / Rocky / Alma): Ensure the SST script and ports are allowed; see [SELinux](selinux.md).
+
+## Systemd timeout: the "silent killer"
+
+On Ubuntu 22.04, systemd manages MySQL. If an SST takes longer than the start timeout (often 90 seconds), systemd kills the process. The user thinks the node "refuses to join," but it was killed by the OS mid-SST. Check `journalctl -u mysql` for timeout or unit failure.
+
+The fix: Create a drop-in so the mysql unit has no start timeout. Run as root or with `sudo`:
+
+```shell
+systemctl edit mysql.service
+```
+
+(On RHEL use `systemctl edit mysqld.service`.) In the editor, add and save:
+
+```ini
+[Service]
+TimeoutStartSec=0
+```
+
+Then:
+
+```shell
+systemctl daemon-reload
+systemctl start mysql
+```
+
+`TimeoutStartSec=0` disables the start timeout so systemd waits indefinitely for SST and recovery. If the node was already killed mid-SST, you may need to clear or restore the data directory and retry after increasing the timeout.
+
+## Firewalls and network
+
+Ensure the SST port (default 4444 for Clone, or the port used by your SST method) and the cluster communication ports are open between joiner and donor. Blocked ports cause silent join failures. See [Secure the network](secure-network.md) for cluster and client connectivity.
+
+For more support options, see [Get help from Percona](get-help.md).
diff --git a/docs/index-contents.md b/docs/index-contents.md
@@ -46,6 +46,9 @@
   - [ProxySQL admin utilities](proxysql-v2.md)
   - [Quickstart Guide for Percona XtraDB Cluster](quickstart-overview.md)
   - [Restart the cluster nodes](restarting-nodes.md)
+  - [Emergency quorum recovery](emergency-quorum-recovery.md)
+  - [SST/Clone failure recovery](sst-clone-failure-recovery.md)
+  - [Environmental blockers](environmental-blockers.md)
   - [Restore 8.0 backup to 8.4 cluster](upgrade-backup.md)
   - [Running Percona XtraDB Cluster in a Docker Container](docker.md)
   - [Secure the network](secure-network.md)

diff --git a/docs/restarting-nodes.md b/docs/restarting-nodes.md
@@ -1,19 +1,13 @@
 # Restart the cluster nodes
 
-To restart a cluster node, shut down MySQL and restart the service. The node leaves the cluster, reducing the total vote count for quorum. 
+This page is the entry point for restarts and recovery when nodes are managed directly (systemd, bare metal, or VMs). In PXC 8.4, Clone SST (`wsrep_sst_method=clone`) is the default rejoin path—set it once and most restarts are "clear port/process, start the service, let Clone run." For Kubernetes, use the [Percona Operator for MySQL](https://docs.percona.com/percona-operator-for-mysql/pxc/) (PXC-based); it automates bootstrap and recovery. For full-cluster down (all nodes stopped, bootstrap from scratch), see [Crash recovery](crash-recovery.md).
 
-The quorum refers to the minimum number of votes required for the cluster to operate effectively and make decisions. Each node in the cluster typically represents one vote. When a node leaves the cluster, the total number of votes decreases, affecting the cluster's ability to achieve quorum. If the cluster does not maintain quorum, it may become unable to process transactions or make changes, potentially leading to a split-brain scenario where different parts of the cluster operate independently.
+Which problem do you have?
 
-Upon rejoining, the node synchronizes using IST (Incremental State Transfer). IST allows the node to catch up with the current state of the cluster by transferring only the changes that occurred while the node was offline. If the necessary changes for IST do not exist in the `gcache` file on any other node within the cluster, the process will perform SST (State Snapshot Transfer) instead. SST involves transferring a complete database snapshot to the node, which can be more time-consuming but ensures that the node receives all data. This approach makes restarting cluster nodes for rolling configuration changes, or software upgrades straightforward from the cluster’s perspective. 
+| Scenario | Topic |
+|----------|--------|
+| Nodes are up but traffic is blocked — Cluster accepts connections but refuses every SQL query (`WSREP has not yet prepared node for application use`). Quorum or primary component lost. | [Emergency quorum recovery](emergency-quorum-recovery.md) |
+| Nodes refuse to join — Joiner won't sync, SST/Clone fails, or node never reaches `Synced`. | [SST/Clone failure recovery](sst-clone-failure-recovery.md) |
+| Environmental blockers — AppArmor, systemd killing the process mid-SST, or firewalls blocking SST/GCOMM. | [Environmental blockers](environmental-blockers.md) |
 
-If a node restarts with an invalid configuration change that prevents MySQL from loading, Galera drops the node’s state and forces an SST for that node.
-
-In the event of a MySQL failure, the system does not remove the PID file because the system deletes this file only during a clean shutdown. As a result, the server does not restart if an existing PID file is present. When MySQL encounters a failure, check the log records for details. You must remove the PID file manually. 
-
-Use the `rm` command in a Unix/Linux shell to do this:
-
-```shell
-bash rm /path/to/mysql.pid
-```
-
-Replace `/path/to/mysql.pid` with the actual path to your MySQL PID file. The default location for the PID file is often `/var/run/mysqld/mysqld.pid` or `/var/lib/mysql/mysql.pid`, but this can vary based on your configuration. Before executing this command, ensure that MySQL is not running, as removing the PID file while the server is active can lead to issues.
+Each topic is a separate page with step-by-step procedures. Start with the one that matches your situation; the pages cross-link where needed (for example, SST/Clone failure recovery points to Environmental blockers when the cause is AppArmor or systemd).