-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peer discovery: disable consul health check helper when registration is disabled #13210
Peer discovery: disable consul health check helper when registration is disabled #13210
Conversation
Good catch. I have tested #13201 with the |
Peer discovery: disable consul health check helper when registration is disabled (backport #13210)
@michaelklishin If I may ask this question. I am finalizing my Nomad file and added a health check to the service like this. It is the stage 2 check from the docs. service {
address_mode = "driver"
name = "rabbitmq"
port = 5672
tags = ["amqp", "${attr.unique.hostname}"]
meta {
erlang-node-name = "rabbit@${attr.unique.hostname}.rabbitmq.service.consul"
}
check {
type = "script"
command = "rabbitmq-diagnostics"
args = ["-q", "status"]
interval = "60s"
timeout = "5s"
initial_status = "passing"
}
} Now, a problem arises when I add the check.
So my guess is that Do you have any idea what might be the cause here? PS. After I have this all running successfully, I will update the Nomad RabbitMQ integration so others can profit from our efforts here. |
Your understanding may be correct. If the cookie file does not exist then CLI tools or RabbitMQ will generate a random value to seed it. This can create a race condition. I would simply switch to the HTTP API for health checks, for every health check command there is an HTTP API endpoint. |
Merci, I was not aware of that yet. |
Plus CLI tools support providing the cookie on the command line as an argument or via an environment variable. I don't think we document or at least promote those options. |
@frederikbosch go to The closest endpoint there is to If you need the simplest health check possible that tells you that the node is up and that's it, a |
@michaelklishin That seems like the best option then. However, then we enter into a new problem: when username and password are coming from Vault, as in my case, we cannot use Vault variables (yet) inside a health check. So the health checks always return 401. With that problem being there, I considered the going back to the So, another option was to add a monitoring user on boot, or directly thereafter, via a definitions file using a fixed username and password not coming from Vault. I decided not to do that because it would make the setup very complicated. In the end I came up with a custom healthcheck script that checks the erlang cookie being there, and if not, consider the node passing the healthcheck. My end result is the following Nomad file. Once v4.0,6 is released I will suggest changing the Nomad RabbitMQ integration into this setup. job "rabbitmq" {
datacenters = ["dc"]
vault {
policies = ["rabbitmq"]
}
group "cluster" {
count = 3
constraint {
operator = "distinct_hosts"
value = "true"
}
update {
max_parallel = 1
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "5s"
healthy_deadline = "60s"
}
task "rabbit" {
driver = "docker"
config {
image = "pivotalrabbitmq/rabbitmq:sha-49eeb5e435167c18a2abb3832ccc6b6051f4a283-otp27"
hostname = "${attr.unique.hostname}.rabbitmq.service.consul"
volumes = [
"local/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf",
"local/enabled_plugins:/etc/rabbitmq/enabled_plugins",
"local/healthcheck.sh:/usr/bin/healthcheck.sh",
"secrets/cert.key:/etc/ssl/cert.key",
"secrets/cert.crt:/etc/ssl/cert.crt",
"secrets/chain.crt:/etc/ssl/chain.crt",
]
}
env {
RABBITMQ_USE_LONGNAME="true"
RABBITMQ_MANAGEMENT_SSL_CACERTFILE=""
RABBITMQ_MANAGEMENT_SSL_CERTFILE=""
RABBITMQ_MANAGEMENT_SSL_KEYFILE=""
}
template {
data = <<EOH
[rabbitmq_management,
rabbitmq_federation,
rabbitmq_federation_management,
rabbitmq_shovel,
rabbitmq_shovel_management,
rabbitmq_mqtt,
rabbitmq_stomp,
rabbitmq_peer_discovery_consul].
EOH
destination = "local/enabled_plugins"
}
template {
data = <<EOH
log.console.level = debug
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_consul
cluster_formation.node_cleanup.only_log_warning = true
cluster_formation.registration.enabled = false
cluster_formation.consul.svc = rabbitmq
cluster_formation.consul.host = {{env "attr.unique.network.ip-address"}}
cluster_formation.consul.acl_token = {{with secret "secret/consul/creds/rabbitmq"}}{{.Data.token}}{{end}}
EOH
destination = "local/rabbitmq.conf"
}
template {
data = <<EOH
#!/usr/bin/env bash
set -e
if [ ! -f /var/lib/rabbitmq/.erlang.cookie ]; then
echo "Waiting for system to boot"
exit 0
fi
rabbitmq-diagnostics -q status
EOH
destination = "local/healthcheck.sh"
}
template {
data = <<EOH
# Empty lines are also ignored
{{with secret "secret/kv1/infrastructure/v1/rabbitmq/v1"}}
RABBITMQ_DEFAULT_USER = "{{.Data.USER}}"
RABBITMQ_DEFAULT_PASS = "{{.Data.PASS}}"
RABBITMQ_ERLANG_COOKIE = "{{.Data.ERLANG_COOKIE}}"
{{end}}
EOH
destination = "secrets/file.env"
env = true
}
service {
address_mode = "driver"
name = "rabbitmq"
port = 5672
tags = ["amqp", "${attr.unique.hostname}"]
meta {
erlang-node-name = "rabbit@${attr.unique.hostname}.rabbitmq.service.consul"
}
check {
type = "script"
command = "bash"
args = ["/usr/bin/healthcheck.sh"]
interval = "60s"
timeout = "5s"
}
}
service {
address_mode = "driver"
name = "rabbitmq-nomad-management"
port = 15672
tags = [
"http",
"traefik.enable=true",
"traefik.http.routers.rabbitmq-http.rule=Host(`mq.myhost`)",
"traefik.http.routers.rabbitmq-http.entrypoints=web",
"traefik.http.routers.rabbitmq-http.middlewares=redirect-https@file",
"traefik.http.routers.rabbitmq-https.rule=Host(`mq.myhost`)",
"traefik.http.routers.rabbitmq-https.entrypoints=websecure",
"traefik.http.routers.rabbitmq-https.tls=true",
]
check {
address_mode = "driver"
type = "http"
path = "/"
port = 15672
interval = "10s"
timeout = "2s"
}
}
resources {
memory = 1000
}
}
}
} |
Proposed Changes
This is a follow-up PR for #13201. In that PR a new configuration option was added:
cluster_formation.registration.enabled
.For Consul, this does work during boot, but the
rabbitmq_peer_discovery_consul_health_check_helper
keep querying Consul periodically to see whether the node is healthy. If it is not, it will register the node even though the registration was disabled. Since the periodic health check has no other function than to register the node if unhealthy, the periodic check needs to be disabled when registration is disabled.I have tested this PR with Consul and Nomad.
Types of Changes
Checklist
CONTRIBUTING.md
document