-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
By @frederikbosch: Peer discovery: add an option to opt out of registration #13201
Conversation
to match that used with Mnesia. In the case of Mnesia, there are 10 retries with a 30 second delay each. For Khepri, a single timeout is used, so it must be ten times as long.
As described in section 7.1 of filtex-v1.0-wd09: > Impose a limit on the complexity of each filter expression. Here, we hard code the maximum properties within a filter expression to 16. There should never be a use case requiring to filter on more than 16 different properties.
for the backends that support it in the first place. When forming a cluster, registration of the node joining the cluster might be left to (container) orchestration tools like Nomad or Kubernetes. This PR add a new configuration option, 'cluster_formation.registration.enable', which defaults to true. When set to false node registration will be skipped. There is at least one important advantage using a tool such as Nomad (plus Consul) over the application (RabbitMQ) doing the registration. When the application is not stopped gracefully for any reason, e.g. its OOM killed, it cannot deregister the service/node. This leaves behind an unlinked service entry in the registry. This problem is fundamentally avoided by allowing Nomad (or similar tools) to register the node'service. See #11233 #11045 for prior discussions. Co-authored-by: Frederik Bosch <[email protected]>
Will have to cherry-pick the relevant commits for |
@michaelklishin That is fantastic. Is there an automatic built container for this PR that I can use to test in my cluster? |
@frederikbosch there is an OCI build job but I'd need to investigate where it publishes to. There will be development builds once this PR is merged, too. |
Found it, will test tomorrow. |
@frederikbosch that's the right account/image but I am less sure about the tag. Sounds reasonable since it matches the branch name. |
Backported manually to |
@frederikbosch just to confirm, does this PR work as expected for you with Nomad? |
@michaelklishin I will need to postpone my test until tomorrow, sorry for that. |
Sure, no rush at all. |
I did some initial tests, but I am not sure what is going on. I removed my previous comments. I need to do some more testing to get what's happening. I will come back to it. |
@michaelklishin I did some extensive testing this morning.
So I started building RabbitMQ from source and disabled the timer inside the helper when registration is disabled. This worked perfectly and I can come up with a PR for this. But then I wondered what if a nodes gets unhealthy. Suppose one of the nodes has a networking problem and Consul notifies this with its health check (also registered with Nomad), and therefore the node does not turn up when querying the list of nodes. But how do the nodes know this? With or without registration; at the moment a node never queries the list of nodes again after formation. Right? The helper only queries whether the own node is healthy and when it is not it executes Is it even necessary that a node knows that peers are temporarily not in the service registry? |
After reading the documentation on Node Health Checks and Forced Removal I think I can answer my own question. Node cleanup is a separate module ( So therefore, I think disabling the health check when registration is disabled is safe. It's really only responsible for executing the |
@frederikbosch thank you for contributing, once again! |
This is #13194 by @frederikbosch with some final touches by me.