[reopen of #125 as long discussion ] Node not join to cluster #359

chrisduong · 2016-04-26T11:46:01Z

Hi,

I'm using rabbitmq cookbook v.4.7.0, and installed latest RabbitMQ version 3.6.1, I noticed that that the LWRP rabbitmq_cluster would only join the node into the cluster only_if "the node is not running any cluster.

However, whenever RabbitMQ server starts it would run in single cluster mode with the cluster name is the node's name itself.

This is the cluster status when the node2 first startup.

[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node2]}]},
{running_nodes,[rabbit@node2]},
{cluster_name,<<"rabbit@node2">>},
{partitions,[]},
{alarms,[{rabbit@node2,[]}]}]

Which means the code block joined_cluster?(var_node_name, var_cluster_status) always return true, and Chef would complain and not join to the cluster:

Chef::Log.warn("[rabbitmq_cluster] Node is already member of #{current_cluster_name(var_cluster_status)}. Joining cluster will be skipped.")

The LWRP only join with the first node in the array, so it make more sense that we should check the cluster status from that node only (for preventing failure when joining) than checking the running_nodes from the "joining node".

The text was updated successfully, but these errors were encountered:

sadowskik · 2016-05-10T15:09:12Z

I'm just dealing with exactly the same issue.

If this check was omitted for each broker beside the first one, it would result in invoking stop_app unnecessary at each run, causing other brokers to be unavailable.

When the node is already a member of the cluster, the var_node_name_to_join is always present in running_nodes list. The most straightforward solution would be to just change this line to: joined_cluster?(var_node_name_to_join, var_cluster_status).

Unfortunately, it's not that simple :) The cluster_status result is different in case of a partition and I'm not really sure if it's desired to try rejoining the cluster whenever the first broker is partitioned from the rest of the cluster.

To briefly sum up, we have a two paths to follow from here:

Simple: check if the first broker is on the running_nodes list. If yes, this means the current broker has been already clustered
More comprehensive: consider various partition scenarios

DavidKaz · 2016-07-06T06:50:38Z

@chrisduong @sadowskik consider this solution #387

akadoya · 2016-07-06T16:02:56Z

I have created a pull request for this issue
#380
at least this works for me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[reopen of #125 as long discussion ] Node not join to cluster #359

[reopen of #125 as long discussion ] Node not join to cluster #359

chrisduong commented Apr 26, 2016 •

edited

Loading

sadowskik commented May 10, 2016

DavidKaz commented Jul 6, 2016

akadoya commented Jul 6, 2016

[reopen of #125 as long discussion ] Node not join to cluster #359

[reopen of #125 as long discussion ] Node not join to cluster #359

Comments

chrisduong commented Apr 26, 2016 • edited Loading

sadowskik commented May 10, 2016

DavidKaz commented Jul 6, 2016

akadoya commented Jul 6, 2016

chrisduong commented Apr 26, 2016 •

edited

Loading