diff --git a/xml/art_sle_ha_install_quick.xml b/xml/art_sle_ha_install_quick.xml
index a200721e0..90158ed89 100644
--- a/xml/art_sle_ha_install_quick.xml
+++ b/xml/art_sle_ha_install_quick.xml
@@ -256,6 +256,16 @@
inside a &geo; cluster.
+
+ &qdevice;/&qnet;
+
+
+ This setup is not covered here. If you want to use a &qnet; server,
+ you can set it up with the bootstrap script as described in
+ .
+
+
+
@@ -641,7 +651,7 @@ softdog 16384 1
The bootstrap scripts take care of changing the configuration specific to
- a two-node cluster, for example, SBD and &corosync;.
+ a two-node cluster, for example, SBD, &corosync;.
Adding the Second Node (&node2;) with
@@ -680,7 +690,7 @@ softdog 16384 1
After logging in to the specified node, the script will copy the
- &corosync; configuration, configure SSH and &csync;, and will
+ &corosync; configuration, configure SSH, &csync;, and will
bring the current machine online as new cluster node. Apart from that,
it will start the service needed for &hawk2;.
diff --git a/xml/ha_qdevice-qnetd.xml b/xml/ha_qdevice-qnetd.xml
new file mode 100644
index 000000000..cc093c96f
--- /dev/null
+++ b/xml/ha_qdevice-qnetd.xml
@@ -0,0 +1,516 @@
+
+ %entities;
+]>
+
+
+ QDevice and QNetd
+
+
+
+ &qdevice; and &qnet; participate in quorum decisions.
+ With the assistance from the arbitrator corosync-qnetd, corosync-qdevice provides
+ a configurable number of votes, so allowing a cluster
+ to sustain more node failures than the standard quorum rules
+ allow. We strongly recommend deploying corosync-qnetd and corosync-qdevice for two-node clusters, but using &qnet; and &qdevice; is also recommended in general for clusters with an even number of nodes.
+
+
+
+
+ editing
+
+
+ yes
+
+
+
+
+
+
+
+ Conceptual Overview
+
+ In comparison to calculating quora among cluster nodes, the
+ &qdevice;-and-&qnet; approach has the following benefits:
+
+
+
+
+
+ It provides better sustainability in case of node failures.
+
+
+
+
+ You can write your own heuristics scripts to affect votes. This is especially useful for complex setups, such as SAP applications.
+
+
+
+
+ It enables you to configure a &qnet; server to provide
+ votes for multiple clusters.
+
+
+
+
+ Allows using diskless SBD for two-node clusters.
+
+
+
+
+ It helps with quorum decisions for clusters with an even number of
+ nodes under split-brain situations, especially for two-node clusters.
+
+
+
+
+
+ A setup with &qdevice;/&qnet; consists of the following components and
+ mechanisms:
+
+
+
+ &qdevice;/&qnet; Components and Mechanisms
+
+ &qnet; (corosync-qnetd)
+
+
+ A systemd service (a daemon, the &qnet; server) which
+ is not part of the cluster.
+ The systemd service provides a vote to the corosync-qdevice daemon.
+
+
+ To improve security, corosync-qnetd
+ can work with TLS for client certificate checking.
+
+
+
+
+ &qdevice; (corosync-qdevice)
+
+
+ A systemd service (a daemon) on each cluster node running together with
+ &corosync;. This is the client of corosync-qnetd.
+ Its primary use is to allow a cluster to sustain more node failures than
+ standard quorum rules allow.
+
+
+ &qdevice; is designed to work with different arbitrators. However, currently,
+ only &qnet; is supported.
+
+
+
+
+ Algorithms
+
+
+ &qdevice; supports different algorithms which determine the behaviour
+ how votes are assigned.
+ Currently, the following exist:
+
+
+
+
+ FFSplit (fifty-fifty split is the default. It is used
+ for clusters with an even number of nodes. If the cluster splits
+ into two similar partitions, this algorithm provides one vote to one of
+ the partitions, based on the results of heuristics checks and
+ other factors.
+
+
+
+
+ LMS (last man standing) allows the only
+ remaining node that can see the &qnet; server to get the votes.
+ So this algorithm is useful when a cluster with only one
+ active node should remain quorate.
+
+
+
+
+
+
+ Heuristics
+
+
+ &qdevice; supports a set of commands (heuristics).
+ The commands are executed locally on startup of cluster services,
+ cluster membership change, successful connect to corosync-qnetd, or optionally, at
+ regular times.
+ The heuristics can be set with the quorum.device.heuristics
+ key (in the corosync.conf file) or with the
+ option.
+ Both know the values off (default),
+ sync, and on.
+ The difference between sync and on
+ is you can additonally execute the above commands regularly.
+
+
+ Only if all commands executed successfully are the heuristics
+ considered to have passed; otherwise, they failed. The heuristics' result is
+ sent to corosync-qnetd where
+ it is used in calculations to determine which partition should be quorate.
+
+
+
+
+ Tiebreaker
+
+
+ This is used as a fallback if the cluster partitions are completely
+ equal even with the same heuristics results. It can be configured
+ to be the lowest, the highest, or a specific node ID.
+
+
+
+
+
+
+
+ Requirements and Prerequisites
+
+ Before setting up &qdevice; and &qnet;, you need to prepare the
+ environment as the following:
+
+
+
+ toms 2020-05-14: See also bsc#1171681
+
+ In addition to the cluster nodes, you have a separate machine
+ which will become the &qnet; server.
+ See .
+
+
+
+
+ A different physical network than the one that &corosync; uses.
+ It is recommended for &qdevice; to reach the &qnet; server.
+ Ideally, the &qnet; server should be in a separate rack than the
+ main cluster, or at least on a separate PSU and not in the same
+ network segment as the corosync ring or rings.
+
+
+
+
+
+
+ Setting Up the &qnet; Server
+
+ The &qnet; server is not part of the cluster stack, it is also
+ not a real member of your cluster. As such, you cannot move resources
+ to this server.
+
+
+ The &qnet; server is almost state free. Usually, you do not need to
+ change anything in the configuration file /etc/sysconfig/corosync-qnetd.
+ By default, the corosync-qnetd service runs the daemon
+ as user coroqnetd
+ in the group coroqnetd. This avoids
+ running the daemon as &rootuser;.
+
+
+ To create a &qnet; server, proceed as follows:
+
+
+
+
+ On the machine that will become the &qnet; server, install
+ &sls; &productnumber;.
+ toms 2020-05-15: See also bsc#1171681
+
+
+
+
+ Log in to the &qnet; server and install the following package:
+
+ &prompt.root;zypper install corosync-qnetd
+
+ You do not need to manually start the corosync-qnetd service. The bootstrap scripts
+ will take care of the startup process during the qdevice stage.
+
+
+
+
+
+ Your &qnet; server is ready to accept connections from a &qdevice; client
+ corosync-qdevice.
+ Further configuration is not needed.
+
+
+
+
+ Connecting &qdevice; Clients to the &qnet; Server
+
+ After you have set up your &qnet; server, you can set up
+ and run the clients.
+ You can connect the clients to the &qnet; server during the installation of your cluster
+ or you can add them later. In the following procedure we use
+ the latter approach. We assume a cluster with two cluster nodes
+ (&node1; and &node2;) and the &qnet; server (&node3;).
+
+
+ toms 2020-05-11: Is step 1 and step 2 really needed? Can I just
+ jump to step 3?
+
+
+ On &node1;, initialize your cluster:
+
+ &prompt.root;crm cluster init -y
+
+
+
+ On &node2;, join the cluster:
+
+ &prompt.root;crm cluster join -c &node1; -y
+
+
+
+ On &node1; and &node2;, bootstrap the qdevice stage.
+ Usually, in most cases the default settings are fine.
+ Provide at least and the hostname or
+ IP address of the &qnet; server (&node3; in our case):
+
+ &prompt.root;crm cluster init qdevice --qnetd-hostname=&node3;
+
+ If you want to change the default settings, get a list of all
+ possible options with the command crm cluster init qdevice --help.
+ All options related to &qdevice; start with
+ .
+
+
+
+
+ If you have used the default settings, the command above creates a &qdevice; that has TLS enabled and uses the FFSplit algorithm.
+
+
+
+
+ Setting Up a &qdevice; with Heuristics
+
+ If you need additional control over how votes are determined, use heuristics.
+ Heuristics are a set of commands which are executed in parallel.
+
+
+ For this purpose, the command crm cluster init qdevice
+ provides the option . You can
+ pass one or more commands (separated by semicolon) with absolute paths.
+
+
+ For example, if your own command for heuristic checks is located at
+ /usr/sbin/my-script.sh you can run it on
+ one of your cluster nodes as follows:
+
+ &prompt.root;crm cluster init qdevice --qdevice-hostname=&node3; \
+ --qdevice-heuristics=/usr/sbin/my-script.sh \
+ --qdevice-heuristics-mode=on
+
+ The command or commands can be written in any language such as Shell, Python, or Ruby.
+ If they succeed, they return 0 (zero), otherwise they return an error code.
+
+
+ You can also pass a set of commands. Only when all commands finish successfully (return code is zero), the heuristics have passed.
+
+
+ The option lets the heuristics
+ commands run regularily.
+
+
+
+
+ Checking and Showing Quorum Status
+
+
+ You can query the quorum status on one of your cluster nodes as shown in .
+ It shows the status of your &qdevice; nodes.
+
+
+
+ Status of &qdevice;
+ &prompt.root;corosync-quorumtool
+ Quorum information
+------------------
+Date: ...
+Quorum provider: corosync_votequorum
+Nodes: 2
+Node ID: 3232235777
+Ring ID: 3232235777/8
+Quorate: Yes
+
+Votequorum information
+----------------------
+Expected votes: 3
+Highest expected: 3
+Total votes: 3
+Quorum: 2
+Flags: Quorate Qdevice
+
+Membership information
+----------------------
+ Nodeid Votes Qdevice Name
+ 3232235777 1 A,V,NMW &subnetI;.1 (local)
+ 3232235778 1 A,V,NMW &subnetI;.2
+ 0 1 Qdevice
+
+
+
+ As an alternative with an identical result, you can also use
+ the crm corosync status quorum command.
+
+
+
+
+ The number of expected nodes we are expecting. In this example, it is a
+ two-node cluster.
+
+
+
+
+ As the node ID is not explicitly specified in corosync.conf
+ this ID is a 32-bit integer representation of the IP address.
+ In this example, the value
+ 3232235777 stands for the IP address &subnetI;.1.
+
+
+
+
+ The quorum status. In this case, the cluster has quorum.
+
+
+
+
+ The status for each cluster node means:
+
+
+
+ A (Alive) or NA (not alive)
+
+
+ Shows the connectivity status between &qdevice; and &corosync;.
+ If there is a heartbeat between &qdevice; and &corosync;,
+ it is shown as alive (A).
+
+
+
+
+ V (Vote) or NV (non vote)
+
+
+ Shows if the quorum device has given a vote (letter V)
+ to the node.
+ A letter V means that both nodes can communicate
+ with each other. In a split-brain situation, one node would be
+ set to V and the other node would be set to
+ NV.
+
+
+
+
+ MW (Master wins) or
+ NMW(not master wins)
+
+
+ Shows if the quorum device master_wins
+ flag is set. By default, the flag is not set, so you see NMW (not master
+ wins)
+ See the man page votequorum_qdevice_master_wins(3) for more
+ information.
+
+
+
+
+
+
+
+
+
+ If you want to query the status of the &qnet; server, you get a similar output as
+ shown in :
+
+
+ Status of &qnet; Server
+ &prompt.root;corosync-qnetd-tool
+Cluster "hacluster":
+ Algorithm: Fifty-Fifty split
+ Tie-breaker: Node with lowest node ID
+ Node ID 3232235777:
+ Client address: ::ffff:&subnetI;.1:54732
+ HB interval: 8000ms
+ Configured node list: 3232235777, 3232235778
+ Ring ID: aa10ab0.8
+ Membership node list: 3232235777, 3232235778
+ Heuristics: Undefined (membership: Undefined, regular: Undefined)
+ TLS active: Yes (client certificate verified)
+ Vote: ACK (ACK)
+ Node ID 3232235778:
+ Client address: ::ffff:&subnetI;.2:43016
+ HB interval: 8000ms
+ Configured node list: 3232235777, 3232235778
+ Ring ID: aa10ab0.8
+ Membership node list: 3232235777, 3232235778
+ Heuristics: Undefined (membership: Undefined, regular: Undefined)
+ TLS active: Yes (client certificate verified)
+ Vote: No change (ACK)
+
+
+
+ As an alternative with an identical result, you can also use
+ the crm corosync status qnetd command.
+
+
+
+
+ The name of your cluster as set in the configuration file
+ /etc/corosync/corosync.conf in the
+ totem.cluster_name section.
+
+
+
+
+ The algorithm currently used. In this example, it is FFSplit.
+
+
+
+
+ This is the entry for the node with the IP address
+ &subnetI;.1.
+ TLS is active.
+
+
+
+
+
+
+
+ For More Information
+
+ For additional information about &qdevice; and &qnet;
+ Man pages of corosync-qdevice(8), corosync-qnetd(8).
+
+
+
diff --git a/xml/ha_storage_protection.xml b/xml/ha_storage_protection.xml
index 7d6e0492a..94c3774d0 100644
--- a/xml/ha_storage_protection.xml
+++ b/xml/ha_storage_protection.xml
@@ -852,11 +852,15 @@ Timeout (msgwait) : 10
Number of Cluster Nodes
+ toms 2020-05-14: yan: there are still some self-contradictions
+ here, but I don't know how to make it better :-)
Do not use diskless SBD as a fencing mechanism
- for two-node clusters. Use it only in clusters with three or more
- nodes. SBD in diskless mode cannot handle split brain
- scenarios for two-node clusters.
+ for two-node clusters.
+ Use diskless SBD only for clusters with three or more nodes.
+ SBD in diskless mode cannot handle split brain scenarios for two-node clusters.
+ If you want to use diskless SBD for two-node clusters, use &qdevice; as
+ described in .