-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/etcd] etcd pods are unable to join existing cluster on node drain #16069
Comments
I've been experiencing the same problem on my end and would appreciate any updates or solutions. |
Hi @abhayycs , Could you provide the values you are using and a set of commands to reproduce the issue? |
values.yaml etcd:
fullnameOverride: 'voltha-etcd-cluster-client'
global:
storageClass: "manual"
persistence:
enabled: false
auth:
rbac:
create: false
enabled: false
replicaCount: 3
resources:
limits:
cpu: 1900m
memory: 1800Mi
requests:
cpu: 950m
memory: 1Gi |
To reproduce the issue I tried the below things:
apiVersion: v1
kind: PersistentVolume
metadata:
name: data-myetcd-0
spec:
capacity:
storage: 8Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/home/tester/data/etcd/data-myetcd-0"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: data-myetcd-1
spec:
capacity:
storage: 8Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/home/tester/data/etcd/data-myetcd-1"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: data-myetcd-2
spec:
capacity:
storage: 8Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/home/tester/data/etcd/data-myetcd-2"
helm command:
|
Seems it could be related to the libetcd.sh logic: else
info "Detected data from previous deployments"
if [[ $(stat -c "%a" "$ETCD_DATA_DIR") != *700 ]]; then
debug "Setting data directory permissions to 700 in a recursive way (required in etcd >=3.4.10)"
debug_execute chmod -R 700 "$ETCD_DATA_DIR" || true
fi
if [[ ${#initial_members[@]} -gt 1 ]]; then
if is_boolean_yes "$ETCD_DISABLE_PRESTOP"; then
info "The member will try to join the cluster by it's own"
export ETCD_INITIAL_CLUSTER_STATE=existing
fi
member_id="$(get_member_id)"
if ! is_healthy_etcd_cluster; then
warn "Cluster not responding!"
if is_boolean_yes "$ETCD_DISASTER_RECOVERY"; then
latest_snapshot_file="$(find /snapshots/ -maxdepth 1 -type f -name 'db-*' | sort | tail -n 1)"
if [[ "${latest_snapshot_file}" != "" ]]; then
info "Restoring etcd cluster from snapshot"
rm -rf "$ETCD_DATA_DIR"
ETCD_INITIAL_CLUSTER="$(recalculate_initial_cluster)"
export ETCD_INITIAL_CLUSTER
[[ -f "$ETCD_CONF_FILE" ]] && etcd_conf_write "initial-cluster" "$ETCD_INITIAL_CLUSTER"
debug_execute etcdctl snapshot restore "${latest_snapshot_file}" \
--name "$ETCD_NAME" \
--data-dir "$ETCD_DATA_DIR" \
--initial-cluster "$ETCD_INITIAL_CLUSTER" \
--initial-cluster-token "$ETCD_INITIAL_CLUSTER_TOKEN" \
--initial-advertise-peer-urls "$ETCD_INITIAL_ADVERTISE_PEER_URLS"
etcd_store_member_id
else
error "There was no snapshot to restore!"
exit 1
fi
else
warn "Disaster recovery is disabled, the cluster will try to recover on it's own"
fi
elif was_etcd_member_removed; then
info "Adding new member to existing cluster"
read -r -a extra_flags <<<"$(etcdctl_auth_flags)"
is_boolean_yes "$ETCD_ON_K8S" && extra_flags+=("--endpoints=$(etcdctl_get_endpoints)")
extra_flags+=("--peer-urls=$ETCD_INITIAL_ADVERTISE_PEER_URLS")
etcdctl member add "$ETCD_NAME" "${extra_flags[@]}" | grep "^ETCD_" >"$ETCD_NEW_MEMBERS_ENV_FILE"
replace_in_file "$ETCD_NEW_MEMBERS_ENV_FILE" "^" "export "
# The value of ETCD_INITIAL_CLUSTER_STATE must be changed for it to be correctly added to the existing cluster
# https://etcd.io/docs/v3.3/op-guide/configuration/#--initial-cluster-state
export ETCD_INITIAL_CLUSTER_STATE=existing
etcd_store_member_id
elif ! is_empty_value "$member_id"; then
info "Updating member in existing cluster"
export ETCD_INITIAL_CLUSTER_STATE=existing
[[ -f "$ETCD_CONF_FILE" ]] && etcd_conf_write "initial-cluster-state" "$ETCD_INITIAL_CLUSTER_STATE"
read -r -a extra_flags <<<"$(etcdctl_auth_flags)"
extra_flags+=("--peer-urls=$ETCD_INITIAL_ADVERTISE_PEER_URLS")
if is_boolean_yes "$ETCD_ON_K8S"; then
extra_flags+=("--endpoints=$(etcdctl_get_endpoints)")
etcdctl member update "$member_id" "${extra_flags[@]}"
else
etcd_start_bg
etcdctl member update "$member_id" "${extra_flags[@]}"
etcd_stop
fi
else
info "Member ID wasn't properly stored, the member will try to join the cluster by it's own"
export ETCD_INITIAL_CLUSTER_STATE=existing
[[ -f "$ETCD_CONF_FILE" ]] && etcd_conf_write "initial-cluster-state" "$ETCD_INITIAL_CLUSTER_STATE"
fi
fi
fi Could you show the log output in your specific scenario? |
myetcd-1 pod is Stuck in CrashLoopBackOff state after draining the node
When a node is drained and pod gets scheduled to another node and pod starts with the below logs and goes to
After Crashloopback State it restarts with the below logs:
|
I'm not sure if it helps, please also find the logs when deleting one of the pod:
|
Seems indeed an issue with the ETCD_INITIAL_CLUSTER_STATE logic. I will create an internal task for the team to address it. We will reach you back here as soon as we can (have in mind we can give an exact ETA, since it depends on the team workload). Meanwhile, I will mark the issue as on-hold. |
Hi @aoterolorenzo ! |
not fixed |
Is a fix for this on the roadmap? |
bitnami etcd cluster seems to be so fragile when it comes to upgrade or even for pod restarts. There is no clue why it is failing!
|
Here is a workaround to recover the pod from crash loopback and make it join the cluster: Find the etcd client that is down. In this case, it was etcd-client-0$ kubectl get pods | grep etcd
etcd-client-0 0/1 CrashLoopBackOff 67 (76s ago) 5h17m knode01.c1.net <none> <none>
etcd-client-1 1/1 Running 0 18d knode02.c1.net <none> <none>
etcd-client-2 1/1 Running 0 29d knode05.c1.net <none> <none> Find its member IDGo to an etcd-client that is running OK like etcd-client-1 in this case and list the members$ kubectl exec -it etcd-client-1 -- etcdctl member list
64d19d2f86bef81, started, etcd-client-0, http://etcd-client-0.etcd-client-headless.svc.c1.net:2380, http://etcd-client-0.etcd-client-headless.svc.c1.net:2379,http://etcd-client.svc.c1.net:2379, false
4050c251f9d2dd49, started, etcd-client-2, http://etcd-client-2.etcd-client-headless.svc.c1.net:2380, http://etcd-client-2.etcd-client-headless.svc.c1.net:2379,http://etcd-client.svc.c1.net:2379, false
814a84a90f431141, started, etcd-client-1, http://etcd-client-1.etcd-client-headless.svc.c1.net:2380, http://etcd-client-1.etcd-client-headless.svc.c1.net:2379,http://etcd-client..svc.c1.net:2379, false The member to delete is etcd-client-0 with ID 64d19d2f86bef81Delete that member$ kubectl exec -it etcd-client-1 -- etcdctl member remove 64d19d2f86bef81
Member 64d19d2f86bef81 removed from cluster 2929d47079397a18 Wait until a new member joins the etcd cluster$ kubectl exec -it etcd-client-1 -- etcdctl member list
4050c251f9d2dd49, started, etcd-client-2, http://etcd-client-2.etcd-client-headless.svc.c1.net:2380, http://etcd-client-2.etcd-client-headless.svc.c1.net:2379,http://etcd-client.svc.c1.net:2379, false
814a84a90f431141, started, etcd-client-1, http://etcd-client-1.etcd-client-headless.svc.c1.net:2380, http://etcd-client-1.etcd-client-headless.svc.c1.net:2379,http://etcd-client.svc.c1.net:2379, false
cb476fe6302e4350, started, etcd-client-0, http://etcd-client-0.etcd-client-headless.svc.c1.net:2380, http://etcd-client-0.etcd-client-headless.svc.c1.net:2379,http://etcd-client.svc.c1.net:2379, false Check the etcd pods are all running$ kubectl get pods | grep etcd
etcd-client-0 1/1 Running 101 (37m ago) 8h
etcd-client-1 1/1 Running 0 19d
etcd-client-2 1/1 Running 0 30d |
@rmuddana-ns That doesn't seem to be a real workaround, as after doing that it just moves around which pod is stuck in a crash loop. After the deleted member joins the cluster again, another node gets kicked off starting the loop again. |
@6ixfalls It seems to be a different problem you have. Probably you have a rolling upgrade pending in your deployment due to a crash loopback issue. And once the pod is recovered, upgrade is moving to the new pod. |
Looks like you're right, thanks. Next step would be to get this fixed in the chart itself. Have you tried if this works with 2 or all 3 pods in a crashloop? |
It should work as long as you have at least one working pod. If you have all 3 of them in crash loopback, you have no one to listen to the etcdctl commands. Yes, root cause has to be identified and fixed. It appears etcd is somehow holding on to the old member ID. I do not know at this point this is coming from the chart or some other place. |
The issue seems to be that when the pod is going down it removes itself from the cluster and then tries adding itself back to the cluster but keeps failing. We can see that in the preStop hook script (https://github.com/bitnami/containers/blob/main/bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/etcd/prestop.sh) the container removes itself from etcd cluster but is unable to join back after restarting. I looked more closely into why it could be happening and it seems like the first time after the restart the pod tries to join back it still has the member id saved and uses this part of the script (https://github.com/bitnami/containers/blob/main/bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/libetcd.sh#L711-L721). The second time around the member id is already lost and it starts throwing errors about not being able to find the member id. Unfortunately I cannot see what is exactly wrong there as I am not that well versed in etcd. Currently the only workaround that I have found to actually help is to set However this means that if you are scaling down the etcd cluster, the pod that will be scaled down will not be removed from etcd member list but you can do that manually by executing into one of the remaining etcd pods and removing the member. |
Thank you for bringing this issue to our attention. We appreciate your involvement! If you're interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here. Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance. |
Hi, it seams that this workaround configuration is not working as expected. As far as is written here the behaviour of moving pod to another kubernetes node must work but the expected result is that it is not working anymore |
Any news about this problem? using |
@pietrogoddibit2win that assumes you're using a PersistentVolume that can be reattached to a different node (e.g. a GCE persistent disk, an Azure disk, an AWS EBS, etc.). However, if you're using a local store that won't work. |
Yes, we're on GCE using PVC |
The only way I have found to deploy a stable etcd cluster using this Helm chart on GKE is with If the root cause is not on track to be fixed in the foreseeable future, consider making Stable values.yaml for me: replicaCount: 5
autoCompactionMode: periodic
autoCompactionRetention: 10m
removeMemberOnContainerTermination: false
auth:
rbac:
enabled: true
allowNoneAuthentication: false
pdb:
create: true
minAvailable: 4
extraEnvVars:
- name: ETCD_SELF_SIGNED_CERT_VALIDITY
value: "100" # 100 years |
I have the same issue running etcd in a mayastor deployment with etcd binding to local PVs (etcd replicas and PVs are distributed to multiple nodes). With replicaCount > 1 and initialClusterState = new, this leads to Workaround that works for me:
replicaCount: 1
initialClusterState: "new"
persistence:
enabled: true
storageClass: "mayastor-etcd-localpv"
replicaCount: 3
initialClusterState: "existing"
persistence:
enabled: true
storageClass: "mayastor-etcd-localpv" |
In recent days, I also encountered this error. I deployed 3 etcd instances on a GKE Autopilot cluster. That's when a node draining happened, one etcd pod was rescheduled on a new node. It failed to restart and reported the error: "etcdserver: member not found.", and then it kept restarting and the final status of the pod was CrashLoopBackOff. I checked the documentation and implementation of bitnami/etcd. I checked the log and found that the “member remove” command was sent, but the member_removal.log file in the PVC may not have been saved successfully. I don’t know the exact reason because this bug does not occur every time and it may be related to the specific implementation of Kubernetes. In short, relying on the local file is not reliable! Suggestions for users:
Suggestions for bitnami/etcd:
|
The documentation said that you should use "member update" only to update peer URLs, not for a member to rejoin. The fix seems to be straightforward to me: just replace "member update" with a pair of "member remove" and "member add". Am I missing something here? |
@abhayycs so I looked into your original logs and it seems that things happened in this order:
I guess if there's a permanent fix to this issue, that would be for the image to ignore |
Now moving on to my problem. Let me know if I should create a separate issue for this but there are several problems with the etcd image:
Overall, I think stateDiagram-v2
state "etcd_initialize" as ei
state "get_number_of_healthy_nodes" as hn
state "start_new_cluster" as nc
state "echo 'manual recovery required'" as em
state "is_data_dir_empty" as ed
state "start_etcd" as se
state "stop_etcd" as st
state "remove_data_dir" as rd
state "remove_old_member_if_exist" as ro
state "join_as_new_member" as jc
[*] --> ei
ei --> hn
hn --> nc: equal 0
nc --> [*]
hn --> em: less than majority
em --> [*]
hn --> ed: equal or more than majority
ed --> se: no
se --> st: if succeeds
st --> [*]
se --> rd: if member is permanently removed
rd --> ro
ro --> jc
jc --> [*]
ed --> ro: yes
|
Name and Version
bitnami/etcd-3.5.8
What architecture are you using?
None
What steps will reproduce the bug?
I'm using 3 node Kubernetes cluster and 3 instances of etcd.
When I'm deleting a pod, pod is able to restart.
When I'm only draining a node, the pod is not able to re-join the cluster, and unable to start.
Observations:
ETCD_INITIAL_CLUSTER_STATE
is 'new' when it's starting from zero (first time).CASE-1: When deleting a pod
ETCD_INITIAL_CLUSTER_STATE
is changing from 'new' to 'existing', and pod is able to start.CASE-2: When draining a node
ETCD_INITIAL_CLUSTER_STATE
is staying 'new', and newly created pod is unable to join the cluster and unable to restart.Are you using any custom parameters or values?
I tried with and without persistence.
What is the expected behavior?
pod should start on node drain. And as per my understanding, the 'ETCD_INITIAL_CLUSTER_STATE' should change to 'existing' on node drain as well.
What do you see instead?
etcd pod not starting on node drain.
Additional information
Please let me know, if this behavior is expected or not, and how can I prevent pod restart failure on node drain.
I'm not sure if it will help:
The text was updated successfully, but these errors were encountered: