From 76eb003258c3e5e22476bcb91d0922d1822959d2 Mon Sep 17 00:00:00 2001 From: Thomas Schraitle Date: Thu, 7 May 2020 18:12:24 +0200 Subject: [PATCH] Fix jira#1745 (ECO) Add QDevice/QNetd * Create new chapter "QDevice and QNetd" in file ha_qdevice-qnetd.xml. * Mention QDevice/QNetd in Installation Quick Start, but refer to above chapter. * Add remarks for additions related to QDevice/QNetd * Apply suggestions from code review by Tanja, Yan, Alexsei, Julien, Roger, Xin. Many thanks! * Apply suggestions from language review by Liam. Many thanks! Co-authored-by: Tanja Roth Co-authored-by: Yan Gao Co-authored-by: Alexsei Burlakov Co-authored-by: Julien ADAMEK Co-authored-by: Roger Zhou Co-authored-by: Xin Liang Co-authored-by: Liam Proven --- xml/art_sle_ha_install_quick.xml | 14 +- xml/book_sle_ha_guide.xml | 1 + xml/entity-decl.ent | 2 + xml/ha_qdevice-qnetd.xml | 516 +++++++++++++++++++++++++++++++ xml/ha_storage_protection.xml | 10 +- 5 files changed, 538 insertions(+), 5 deletions(-) create mode 100644 xml/ha_qdevice-qnetd.xml diff --git a/xml/art_sle_ha_install_quick.xml b/xml/art_sle_ha_install_quick.xml index a200721e0..90158ed89 100644 --- a/xml/art_sle_ha_install_quick.xml +++ b/xml/art_sle_ha_install_quick.xml @@ -256,6 +256,16 @@ inside a &geo; cluster. + + &qdevice;/&qnet; + + + This setup is not covered here. If you want to use a &qnet; server, + you can set it up with the bootstrap script as described in + . + + + @@ -641,7 +651,7 @@ softdog 16384 1 The bootstrap scripts take care of changing the configuration specific to - a two-node cluster, for example, SBD and &corosync;. + a two-node cluster, for example, SBD, &corosync;. Adding the Second Node (<systemitem class="server">&node2;</systemitem>) with @@ -680,7 +690,7 @@ softdog 16384 1</screen> </para> <para> After logging in to the specified node, the script will copy the - &corosync; configuration, configure SSH and &csync;, and will + &corosync; configuration, configure SSH, &csync;, and will bring the current machine online as new cluster node. Apart from that, it will start the service needed for &hawk2;. <!-- If you have configured shared storage with OCFS2, it will also diff --git a/xml/book_sle_ha_guide.xml b/xml/book_sle_ha_guide.xml index 58125e757..17128889f 100644 --- a/xml/book_sle_ha_guide.xml +++ b/xml/book_sle_ha_guide.xml @@ -60,6 +60,7 @@ <xi:include href="ha_agents.xml"/> <xi:include href="ha_fencing.xml"/> <xi:include href="ha_storage_protection.xml"/> + <xi:include href="ha_qdevice-qnetd.xml"/> <xi:include href="ha_acl.xml"/> <xi:include href="ha_netbonding.xml"/> <xi:include href="ha_loadbalancing.xml"/> diff --git a/xml/entity-decl.ent b/xml/entity-decl.ent index d6b24710c..9d9b6ad4d 100644 --- a/xml/entity-decl.ent +++ b/xml/entity-decl.ent @@ -133,6 +133,8 @@ <!ENTITY ais "OpenAIS"> <!ENTITY corosync "Corosync"> <!ENTITY pace "Pacemaker"> +<!ENTITY qdevice "QDevice"> +<!ENTITY qnet "QNetd"> <!-- According to aspiers, there is an inconcistency. The systemd service is called pacemaker_remote, but the daemon pacemaker-remoted. :-( --> diff --git a/xml/ha_qdevice-qnetd.xml b/xml/ha_qdevice-qnetd.xml new file mode 100644 index 000000000..cc093c96f --- /dev/null +++ b/xml/ha_qdevice-qnetd.xml @@ -0,0 +1,516 @@ +<!DOCTYPE chapter +[ + <!ENTITY % entities SYSTEM "entity-decl.ent"> + %entities; +]> +<!-- + +--> +<chapter version="5.0" xml:id="cha-ha-qdevice" + xmlns="http://docbook.org/ns/docbook" + xmlns:xi="http://www.w3.org/2001/XInclude" + xmlns:xlink="http://www.w3.org/1999/xlink"> + <title>QDevice and QNetd + + + + &qdevice; and &qnet; participate in quorum decisions. + With the assistance from the arbitrator corosync-qnetd, corosync-qdevice provides + a configurable number of votes, so allowing a cluster + to sustain more node failures than the standard quorum rules + allow. We strongly recommend deploying corosync-qnetd and corosync-qdevice for two-node clusters, but using &qnet; and &qdevice; is also recommended in general for clusters with an even number of nodes. + + + + + editing + + + yes + + + + + + + + Conceptual Overview + + In comparison to calculating quora among cluster nodes, the + &qdevice;-and-&qnet; approach has the following benefits: + + + + + + It provides better sustainability in case of node failures. + + + + + You can write your own heuristics scripts to affect votes. This is especially useful for complex setups, such as SAP applications. + + + + + It enables you to configure a &qnet; server to provide + votes for multiple clusters. + + + + + Allows using diskless SBD for two-node clusters. + + + + + It helps with quorum decisions for clusters with an even number of + nodes under split-brain situations, especially for two-node clusters. + + + + + + A setup with &qdevice;/&qnet; consists of the following components and + mechanisms: + + + + &qdevice;/&qnet; Components and Mechanisms + + &qnet; (corosync-qnetd) + + + A systemd service (a daemon, the &qnet; server) which + is not part of the cluster. + The systemd service provides a vote to the corosync-qdevice daemon. + + + To improve security, corosync-qnetd + can work with TLS for client certificate checking. + + + + + &qdevice; (corosync-qdevice) + + + A systemd service (a daemon) on each cluster node running together with + &corosync;. This is the client of corosync-qnetd. + Its primary use is to allow a cluster to sustain more node failures than + standard quorum rules allow. + + + &qdevice; is designed to work with different arbitrators. However, currently, + only &qnet; is supported. + + + + + Algorithms + + + &qdevice; supports different algorithms which determine the behaviour + how votes are assigned. + Currently, the following exist: + + + + + FFSplit (fifty-fifty split is the default. It is used + for clusters with an even number of nodes. If the cluster splits + into two similar partitions, this algorithm provides one vote to one of + the partitions, based on the results of heuristics checks and + other factors. + + + + + LMS (last man standing) allows the only + remaining node that can see the &qnet; server to get the votes. + So this algorithm is useful when a cluster with only one + active node should remain quorate. + + + + + + + Heuristics + + + &qdevice; supports a set of commands (heuristics). + The commands are executed locally on startup of cluster services, + cluster membership change, successful connect to corosync-qnetd, or optionally, at + regular times. + The heuristics can be set with the quorum.device.heuristics + key (in the corosync.conf file) or with the + option. + Both know the values off (default), + sync, and on. + The difference between sync and on + is you can additonally execute the above commands regularly. + + + Only if all commands executed successfully are the heuristics + considered to have passed; otherwise, they failed. The heuristics' result is + sent to corosync-qnetd where + it is used in calculations to determine which partition should be quorate. + + + + + Tiebreaker + + + This is used as a fallback if the cluster partitions are completely + equal even with the same heuristics results. It can be configured + to be the lowest, the highest, or a specific node ID. + + + + + + + + Requirements and Prerequisites + + Before setting up &qdevice; and &qnet;, you need to prepare the + environment as the following: + + + + toms 2020-05-14: See also bsc#1171681 + + In addition to the cluster nodes, you have a separate machine + which will become the &qnet; server. + See . + + + + + A different physical network than the one that &corosync; uses. + It is recommended for &qdevice; to reach the &qnet; server. + Ideally, the &qnet; server should be in a separate rack than the + main cluster, or at least on a separate PSU and not in the same + network segment as the corosync ring or rings. + + + + + + + Setting Up the &qnet; Server + + The &qnet; server is not part of the cluster stack, it is also + not a real member of your cluster. As such, you cannot move resources + to this server. + + + The &qnet; server is almost state free. Usually, you do not need to + change anything in the configuration file /etc/sysconfig/corosync-qnetd. + By default, the corosync-qnetd service runs the daemon + as user coroqnetd + in the group coroqnetd. This avoids + running the daemon as &rootuser;. + + + To create a &qnet; server, proceed as follows: + + + + + On the machine that will become the &qnet; server, install + &sls; &productnumber;. + toms 2020-05-15: See also bsc#1171681 + + + + + Log in to the &qnet; server and install the following package: + + &prompt.root;zypper install corosync-qnetd + + You do not need to manually start the corosync-qnetd service. The bootstrap scripts + will take care of the startup process during the qdevice stage. + + + + + + Your &qnet; server is ready to accept connections from a &qdevice; client + corosync-qdevice. + Further configuration is not needed. + + + + + Connecting &qdevice; Clients to the &qnet; Server + + After you have set up your &qnet; server, you can set up + and run the clients. + You can connect the clients to the &qnet; server during the installation of your cluster + or you can add them later. In the following procedure we use + the latter approach. We assume a cluster with two cluster nodes + (&node1; and &node2;) and the &qnet; server (&node3;). + + + toms 2020-05-11: Is step 1 and step 2 really needed? Can I just + jump to step 3? + + + On &node1;, initialize your cluster: + + &prompt.root;crm cluster init -y + + + + On &node2;, join the cluster: + + &prompt.root;crm cluster join -c &node1; -y + + + + On &node1; and &node2;, bootstrap the qdevice stage. + Usually, in most cases the default settings are fine. + Provide at least and the hostname or + IP address of the &qnet; server (&node3; in our case): + + &prompt.root;crm cluster init qdevice --qnetd-hostname=&node3; + + If you want to change the default settings, get a list of all + possible options with the command crm cluster init qdevice --help. + All options related to &qdevice; start with + . + + + + + If you have used the default settings, the command above creates a &qdevice; that has TLS enabled and uses the FFSplit algorithm. + + + + + Setting Up a &qdevice; with Heuristics + + If you need additional control over how votes are determined, use heuristics. + Heuristics are a set of commands which are executed in parallel. + + + For this purpose, the command crm cluster init qdevice + provides the option . You can + pass one or more commands (separated by semicolon) with absolute paths. + + + For example, if your own command for heuristic checks is located at + /usr/sbin/my-script.sh you can run it on + one of your cluster nodes as follows: + + &prompt.root;crm cluster init qdevice --qdevice-hostname=&node3; \ + --qdevice-heuristics=/usr/sbin/my-script.sh \ + --qdevice-heuristics-mode=on + + The command or commands can be written in any language such as Shell, Python, or Ruby. + If they succeed, they return 0 (zero), otherwise they return an error code. + + + You can also pass a set of commands. Only when all commands finish successfully (return code is zero), the heuristics have passed. + + + The option lets the heuristics + commands run regularily. + + + + + Checking and Showing Quorum Status + + + You can query the quorum status on one of your cluster nodes as shown in . + It shows the status of your &qdevice; nodes. + + + + Status of &qdevice; + &prompt.root;corosync-quorumtool + Quorum information +------------------ +Date: ... +Quorum provider: corosync_votequorum +Nodes: 2 +Node ID: 3232235777 +Ring ID: 3232235777/8 +Quorate: Yes + +Votequorum information +---------------------- +Expected votes: 3 +Highest expected: 3 +Total votes: 3 +Quorum: 2 +Flags: Quorate Qdevice + +Membership information +---------------------- + Nodeid Votes Qdevice Name + 3232235777 1 A,V,NMW &subnetI;.1 (local) + 3232235778 1 A,V,NMW &subnetI;.2 + 0 1 Qdevice + + + + As an alternative with an identical result, you can also use + the crm corosync status quorum command. + + + + + The number of expected nodes we are expecting. In this example, it is a + two-node cluster. + + + + + As the node ID is not explicitly specified in corosync.conf + this ID is a 32-bit integer representation of the IP address. + In this example, the value + 3232235777 stands for the IP address &subnetI;.1. + + + + + The quorum status. In this case, the cluster has quorum. + + + + + The status for each cluster node means: + + + + A (Alive) or NA (not alive) + + + Shows the connectivity status between &qdevice; and &corosync;. + If there is a heartbeat between &qdevice; and &corosync;, + it is shown as alive (A). + + + + + V (Vote) or NV (non vote) + + + Shows if the quorum device has given a vote (letter V) + to the node. + A letter V means that both nodes can communicate + with each other. In a split-brain situation, one node would be + set to V and the other node would be set to + NV. + + + + + MW (Master wins) or + NMW(not master wins) + + + Shows if the quorum device master_wins + flag is set. By default, the flag is not set, so you see NMW (not master + wins) + See the man page votequorum_qdevice_master_wins(3) for more + information. + + + + + + + + + + If you want to query the status of the &qnet; server, you get a similar output as + shown in : + + + Status of &qnet; Server + &prompt.root;corosync-qnetd-tool +Cluster "hacluster": + Algorithm: Fifty-Fifty split + Tie-breaker: Node with lowest node ID + Node ID 3232235777: + Client address: ::ffff:&subnetI;.1:54732 + HB interval: 8000ms + Configured node list: 3232235777, 3232235778 + Ring ID: aa10ab0.8 + Membership node list: 3232235777, 3232235778 + Heuristics: Undefined (membership: Undefined, regular: Undefined) + TLS active: Yes (client certificate verified) + Vote: ACK (ACK) + Node ID 3232235778: + Client address: ::ffff:&subnetI;.2:43016 + HB interval: 8000ms + Configured node list: 3232235777, 3232235778 + Ring ID: aa10ab0.8 + Membership node list: 3232235777, 3232235778 + Heuristics: Undefined (membership: Undefined, regular: Undefined) + TLS active: Yes (client certificate verified) + Vote: No change (ACK) + + + + As an alternative with an identical result, you can also use + the crm corosync status qnetd command. + + + + + The name of your cluster as set in the configuration file + /etc/corosync/corosync.conf in the + totem.cluster_name section. + + + + + The algorithm currently used. In this example, it is FFSplit. + + + + + This is the entry for the node with the IP address + &subnetI;.1. + TLS is active. + + + + + + + + For More Information + + For additional information about &qdevice; and &qnet; + Man pages of corosync-qdevice(8), corosync-qnetd(8). + + + diff --git a/xml/ha_storage_protection.xml b/xml/ha_storage_protection.xml index 7d6e0492a..94c3774d0 100644 --- a/xml/ha_storage_protection.xml +++ b/xml/ha_storage_protection.xml @@ -852,11 +852,15 @@ Timeout (msgwait) : 10 Number of Cluster Nodes + toms 2020-05-14: yan: there are still some self-contradictions + here, but I don't know how to make it better :-) Do not use diskless SBD as a fencing mechanism - for two-node clusters. Use it only in clusters with three or more - nodes. SBD in diskless mode cannot handle split brain - scenarios for two-node clusters. + for two-node clusters. + Use diskless SBD only for clusters with three or more nodes. + SBD in diskless mode cannot handle split brain scenarios for two-node clusters. + If you want to use diskless SBD for two-node clusters, use &qdevice; as + described in .