Skip to content

etcd: HorizontalScaling scale-out leaves cluster stuck in Updating (memberJoin false positive) #2541

@weicao

Description

@weicao

Summary

After a horizontal scale-out (3 → 4 replicas), the KubeBlocks component controller enters an infinite loop reporting that the new member has not joined, even though etcdctl member list shows the new pod is fully started and part of the cluster.

Environment

  • KubeBlocks: v1.0.2
  • etcd addon: v1.0.2
  • Kubernetes: EKS (ap-southeast-1)

Steps to Reproduce

  1. Create a 3-replica etcd cluster
  2. Apply a HorizontalScaling OpsRequest with scaleOut.replicaChanges: 1
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: etcd-scale-out
  namespace: demo
spec:
  clusterName: etcd-cluster
  type: HorizontalScaling
  horizontalScaling:
  - componentName: etcd
    scaleOut:
      replicaChanges: 1
  1. OpsRequest completes with Succeed 1/1 (~78 seconds)
  2. Observe the component controller logs

Observed Behavior

The OpsRequest reports Succeed, but the cluster remains in Updating phase indefinitely. The component controller loops on the memberJoin lifecycle action:

action failed some replicas have not joined: [etcd-cluster-etcd-3]

This message repeats continuously. Meanwhile, etcdctl member list from inside the pod shows etcd-3 is a healthy, started member:

f3e7454e8da39e57, started, etcd-cluster-etcd-3, http://etcd-cluster-etcd-3...:2380, ...

The cluster never transitions back to Running.

Root Cause (confirmed by code analysis)

Two bugs interact to produce the infinite loop:

Bug 1 — Annotation not persisted on transient failure (KubeBlocks core)

File: controllers/apps/component/transformer_component_workload.go, handleUpdate()

When joinMember4ScaleOut() returns a RequeueError (e.g., because kbagent is not yet ready — dial tcp …:3501: connection refused), handleUpdate() returns early before cli.Update() is called:

if err := t.handleWorkloadUpdate(...); err != nil {
    return err          // ← RequeueError exits here
}
objCopy := copyAndMergeITS(runningITS, protoITS, ...)
if objCopy != nil {
    cli.Update(dag, nil, objCopy, ...)   // ← NEVER REACHED on error
}

Because cli.Update() is never called, the MemberJoined=true annotation is never written to the running InstanceSet on the API server. On the next reconcile, BuildReplicasStatus(runningITS, protoITS) copies MemberJoined=false from the unchanged running InstanceSet, resetting the state. This creates a retry loop.

Bug 2 — member-join.sh is not idempotent (etcd addon)

File: addons/etcd/scripts/member-join.sh, add_member()

exec_etcdctl "$leader_endpoint:3379" member add "$KB_JOIN_MEMBER_POD_NAME" \
  --peer-urls="$peer_protocol://$join_member_endpoint:2380" || error_exit "Failed to join member"

There is no guard against calling member add when the member already exists. The exact sequence that creates the infinite loop:

  1. Reconcile R1: kbagent not yet ready → connection refusedRequeueErrorcli.Update() skipped → MemberJoined=false stays in running InstanceSet.
  2. ...repeated N times until kbagent starts...
  3. Reconcile RN: kbagent ready → etcdctl member add etcd-3 SUCCEEDS → etcd-3 is now in the member list → joinMemberForPod() returns nil → function returns nil → cli.Update() called with MemberJoined=true.
  4. If cli.Update() succeeds → done, cluster recovers.
  5. If cli.Update() fails (API server conflict, transient error) → reconcile retried.
  6. Reconcile RN+1: BuildReplicasStatus copies MemberJoined=false from unchanged running InstanceSet → member add etcd-3"member already exists"error_exit → non-zero exit → joinMemberForPod() returns error → RequeueErrorcli.Update() skipped → infinite loop.

Confirmed with live test

17:51:49 INFO  ... connection refused (attempt 1)
17:51:50 INFO  ... connection refused (attempt 2)
17:51:51 INFO  ... connection refused (attempts 3-5)
17:51:52 INFO  ... connection refused (attempt 6)
17:51:53 INFO  succeed to join member for pod: etcd-cluster-etcd-3

In most cases the cluster recovers (step 4 above). In the reporter's case, step 5-6 triggered the infinite loop.

Fix

Addon fix (recommended, immediate): Make member-join.sh idempotent

Check whether the member is already registered before calling member add. If it is, return success immediately:

add_member() {
  ...
  # Idempotency: skip if member already exists
  if exec_etcdctl "$leader_endpoint:2379" member list | grep -qw "$KB_JOIN_MEMBER_POD_NAME"; then
    log "Member $KB_JOIN_MEMBER_POD_NAME already exists in cluster, skipping"
    return 0
  fi

  exec_etcdctl "$leader_endpoint:2379" member add "$KB_JOIN_MEMBER_POD_NAME" \
    --peer-urls="$peer_protocol://$join_member_endpoint:2380" || error_exit "Failed to join member"
  log "Member $KB_JOIN_MEMBER_POD_NAME joined cluster via leader $leader_endpoint"
}

This ensures that even after MemberJoined=true fails to persist, subsequent reconciles call member add (which is now a no-op for already-registered members), joinMemberForPod() returns nil, and cli.Update() eventually writes MemberJoined=true.

Core fix (long-term): Persist MemberJoined=true independently

In handleUpdate(), the MemberJoined=true annotation should be written to the API server immediately after joinMemberForPod() succeeds, regardless of whether the InstanceSet spec update succeeds. This decouples cluster membership tracking from the workload spec update.

There is already a // TODO: should wait for the data to be loaded before joining the member? comment in joinMember4ScaleOut() (transformer_component_workload_ops.go:351) indicating awareness of sequencing issues in this code path.

Impact

  • Cluster stuck in Updating indefinitely
  • All subsequent OpsRequests (including scale-in) are rejected
  • Workaround: delete and recreate the cluster

Expected Behavior

After scale-out completes and the new member is fully started, the memberJoin action should succeed (idempotently), allowing the cluster to transition to Running.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions