Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reopen of #125 as long discussion ] Node not join to cluster #359

Open
chrisduong opened this issue Apr 26, 2016 · 3 comments
Open

[reopen of #125 as long discussion ] Node not join to cluster #359

chrisduong opened this issue Apr 26, 2016 · 3 comments

Comments

@chrisduong
Copy link

chrisduong commented Apr 26, 2016

Hi,

I'm using rabbitmq cookbook v.4.7.0, and installed latest RabbitMQ version 3.6.1, I noticed that that the LWRP rabbitmq_cluster would only join the node into the cluster only_if "the node is not running any cluster.

However, whenever RabbitMQ server starts it would run in single cluster mode with the cluster name is the node's name itself.

This is the cluster status when the node2 first startup.

[root@node2 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@node2 ...
[{nodes,[{disc,[rabbit@node2]}]},
{running_nodes,[rabbit@node2]},
{cluster_name,<<"rabbit@node2">>},
{partitions,[]},
{alarms,[{rabbit@node2,[]}]}]

Which means the code block joined_cluster?(var_node_name, var_cluster_status) always return true, and Chef would complain and not join to the cluster:

Chef::Log.warn("[rabbitmq_cluster] Node is already member of #{current_cluster_name(var_cluster_status)}. Joining cluster will be skipped.")

The LWRP only join with the first node in the array, so it make more sense that we should check the cluster status from that node only (for preventing failure when joining) than checking the running_nodes from the "joining node".

@sadowskik
Copy link

I'm just dealing with exactly the same issue.

If this check was omitted for each broker beside the first one, it would result in invoking stop_app unnecessary at each run, causing other brokers to be unavailable.

When the node is already a member of the cluster, the var_node_name_to_join is always present in running_nodes list. The most straightforward solution would be to just change this line to: joined_cluster?(var_node_name_to_join, var_cluster_status).

Unfortunately, it's not that simple :) The cluster_status result is different in case of a partition and I'm not really sure if it's desired to try rejoining the cluster whenever the first broker is partitioned from the rest of the cluster.

To briefly sum up, we have a two paths to follow from here:

  1. Simple: check if the first broker is on the running_nodes list. If yes, this means the current broker has been already clustered
  2. More comprehensive: consider various partition scenarios

@DavidKaz
Copy link

DavidKaz commented Jul 6, 2016

@chrisduong @sadowskik consider this solution #387

@akadoya
Copy link
Contributor

akadoya commented Jul 6, 2016

I have created a pull request for this issue
#380
at least this works for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants