Summary
After a horizontal scale-out (3 → 4 replicas), the KubeBlocks component controller enters an infinite loop reporting that the new member has not joined, even though etcdctl member list shows the new pod is fully started and part of the cluster.
Environment
- KubeBlocks: v1.0.2
- etcd addon: v1.0.2
- Kubernetes: EKS (ap-southeast-1)
Steps to Reproduce
- Create a 3-replica etcd cluster
- Apply a HorizontalScaling OpsRequest with
scaleOut.replicaChanges: 1
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: etcd-scale-out
namespace: demo
spec:
clusterName: etcd-cluster
type: HorizontalScaling
horizontalScaling:
- componentName: etcd
scaleOut:
replicaChanges: 1
- OpsRequest completes with
Succeed 1/1 (~78 seconds)
- Observe the component controller logs
Observed Behavior
The OpsRequest reports Succeed, but the cluster remains in Updating phase indefinitely. The component controller loops on the memberJoin lifecycle action:
action failed some replicas have not joined: [etcd-cluster-etcd-3]
This message repeats continuously. Meanwhile, etcdctl member list from inside the pod shows etcd-3 is a healthy, started member:
f3e7454e8da39e57, started, etcd-cluster-etcd-3, http://etcd-cluster-etcd-3...:2380, ...
The cluster never transitions back to Running.
Root Cause (confirmed by code analysis)
Two bugs interact to produce the infinite loop:
Bug 1 — Annotation not persisted on transient failure (KubeBlocks core)
File: controllers/apps/component/transformer_component_workload.go, handleUpdate()
When joinMember4ScaleOut() returns a RequeueError (e.g., because kbagent is not yet ready — dial tcp …:3501: connection refused), handleUpdate() returns early before cli.Update() is called:
if err := t.handleWorkloadUpdate(...); err != nil {
return err // ← RequeueError exits here
}
objCopy := copyAndMergeITS(runningITS, protoITS, ...)
if objCopy != nil {
cli.Update(dag, nil, objCopy, ...) // ← NEVER REACHED on error
}
Because cli.Update() is never called, the MemberJoined=true annotation is never written to the running InstanceSet on the API server. On the next reconcile, BuildReplicasStatus(runningITS, protoITS) copies MemberJoined=false from the unchanged running InstanceSet, resetting the state. This creates a retry loop.
Bug 2 — member-join.sh is not idempotent (etcd addon)
File: addons/etcd/scripts/member-join.sh, add_member()
exec_etcdctl "$leader_endpoint:3379" member add "$KB_JOIN_MEMBER_POD_NAME" \
--peer-urls="$peer_protocol://$join_member_endpoint:2380" || error_exit "Failed to join member"
There is no guard against calling member add when the member already exists. The exact sequence that creates the infinite loop:
- Reconcile R1: kbagent not yet ready →
connection refused → RequeueError → cli.Update() skipped → MemberJoined=false stays in running InstanceSet.
- ...repeated N times until kbagent starts...
- Reconcile RN: kbagent ready →
etcdctl member add etcd-3 SUCCEEDS → etcd-3 is now in the member list → joinMemberForPod() returns nil → function returns nil → cli.Update() called with MemberJoined=true.
- If
cli.Update() succeeds → done, cluster recovers.
- If
cli.Update() fails (API server conflict, transient error) → reconcile retried.
- Reconcile RN+1:
BuildReplicasStatus copies MemberJoined=false from unchanged running InstanceSet → member add etcd-3 → "member already exists" → error_exit → non-zero exit → joinMemberForPod() returns error → RequeueError → cli.Update() skipped → infinite loop.
Confirmed with live test
17:51:49 INFO ... connection refused (attempt 1)
17:51:50 INFO ... connection refused (attempt 2)
17:51:51 INFO ... connection refused (attempts 3-5)
17:51:52 INFO ... connection refused (attempt 6)
17:51:53 INFO succeed to join member for pod: etcd-cluster-etcd-3
In most cases the cluster recovers (step 4 above). In the reporter's case, step 5-6 triggered the infinite loop.
Fix
Addon fix (recommended, immediate): Make member-join.sh idempotent
Check whether the member is already registered before calling member add. If it is, return success immediately:
add_member() {
...
# Idempotency: skip if member already exists
if exec_etcdctl "$leader_endpoint:2379" member list | grep -qw "$KB_JOIN_MEMBER_POD_NAME"; then
log "Member $KB_JOIN_MEMBER_POD_NAME already exists in cluster, skipping"
return 0
fi
exec_etcdctl "$leader_endpoint:2379" member add "$KB_JOIN_MEMBER_POD_NAME" \
--peer-urls="$peer_protocol://$join_member_endpoint:2380" || error_exit "Failed to join member"
log "Member $KB_JOIN_MEMBER_POD_NAME joined cluster via leader $leader_endpoint"
}
This ensures that even after MemberJoined=true fails to persist, subsequent reconciles call member add (which is now a no-op for already-registered members), joinMemberForPod() returns nil, and cli.Update() eventually writes MemberJoined=true.
Core fix (long-term): Persist MemberJoined=true independently
In handleUpdate(), the MemberJoined=true annotation should be written to the API server immediately after joinMemberForPod() succeeds, regardless of whether the InstanceSet spec update succeeds. This decouples cluster membership tracking from the workload spec update.
There is already a // TODO: should wait for the data to be loaded before joining the member? comment in joinMember4ScaleOut() (transformer_component_workload_ops.go:351) indicating awareness of sequencing issues in this code path.
Impact
- Cluster stuck in
Updating indefinitely
- All subsequent OpsRequests (including scale-in) are rejected
- Workaround: delete and recreate the cluster
Expected Behavior
After scale-out completes and the new member is fully started, the memberJoin action should succeed (idempotently), allowing the cluster to transition to Running.
Summary
After a horizontal scale-out (3 → 4 replicas), the KubeBlocks component controller enters an infinite loop reporting that the new member has not joined, even though
etcdctl member listshows the new pod is fully started and part of the cluster.Environment
Steps to Reproduce
scaleOut.replicaChanges: 1Succeed 1/1(~78 seconds)Observed Behavior
The OpsRequest reports
Succeed, but the cluster remains inUpdatingphase indefinitely. The component controller loops on thememberJoinlifecycle action:This message repeats continuously. Meanwhile,
etcdctl member listfrom inside the pod shows etcd-3 is a healthy, started member:The cluster never transitions back to
Running.Root Cause (confirmed by code analysis)
Two bugs interact to produce the infinite loop:
Bug 1 — Annotation not persisted on transient failure (KubeBlocks core)
File:
controllers/apps/component/transformer_component_workload.go,handleUpdate()When
joinMember4ScaleOut()returns aRequeueError(e.g., because kbagent is not yet ready —dial tcp …:3501: connection refused),handleUpdate()returns early beforecli.Update()is called:Because
cli.Update()is never called, theMemberJoined=trueannotation is never written to the running InstanceSet on the API server. On the next reconcile,BuildReplicasStatus(runningITS, protoITS)copiesMemberJoined=falsefrom the unchanged running InstanceSet, resetting the state. This creates a retry loop.Bug 2 —
member-join.shis not idempotent (etcd addon)File:
addons/etcd/scripts/member-join.sh,add_member()There is no guard against calling
member addwhen the member already exists. The exact sequence that creates the infinite loop:connection refused→RequeueError→cli.Update()skipped →MemberJoined=falsestays in running InstanceSet.etcdctl member add etcd-3SUCCEEDS → etcd-3 is now in the member list →joinMemberForPod()returns nil → function returns nil →cli.Update()called withMemberJoined=true.cli.Update()succeeds → done, cluster recovers.cli.Update()fails (API server conflict, transient error) → reconcile retried.BuildReplicasStatuscopiesMemberJoined=falsefrom unchanged running InstanceSet →member add etcd-3→ "member already exists" →error_exit→ non-zero exit →joinMemberForPod()returns error →RequeueError→cli.Update()skipped → infinite loop.Confirmed with live test
In most cases the cluster recovers (step 4 above). In the reporter's case, step 5-6 triggered the infinite loop.
Fix
Addon fix (recommended, immediate): Make
member-join.shidempotentCheck whether the member is already registered before calling
member add. If it is, return success immediately:This ensures that even after
MemberJoined=truefails to persist, subsequent reconciles callmember add(which is now a no-op for already-registered members),joinMemberForPod()returns nil, andcli.Update()eventually writesMemberJoined=true.Core fix (long-term): Persist
MemberJoined=trueindependentlyIn
handleUpdate(), theMemberJoined=trueannotation should be written to the API server immediately afterjoinMemberForPod()succeeds, regardless of whether the InstanceSet spec update succeeds. This decouples cluster membership tracking from the workload spec update.There is already a
// TODO: should wait for the data to be loaded before joining the member?comment injoinMember4ScaleOut()(transformer_component_workload_ops.go:351) indicating awareness of sequencing issues in this code path.Impact
UpdatingindefinitelyExpected Behavior
After scale-out completes and the new member is fully started, the
memberJoinaction should succeed (idempotently), allowing the cluster to transition toRunning.