Skip to content

fix(valkey): switchover scripts iterate stale POD_FQDN_LIST after scale-out #2608

@weicao

Description

@weicao

Problem

When a Valkey cluster is scaled out (e.g. 3 -> 4 replicas) and a targeted switchover is then issued to the freshly added replica, the OpsRequest fails with:

WARNING: could not confirm new primary within 300s

even though Sentinel has already promoted the fresh candidate, post/settle topology is correct, and replica-priority has been restored.

Root cause

addons/valkey/scripts/switchover.sh iterates a member list sourced from the container env variable VALKEY_POD_FQDN_LIST, which is rendered into pod environment at pod creation time via componentVarRef.podFQDNs. The container env of an existing pod is not refreshed by KubeBlocks after scale-out.

So when scale-out grows replicas from N to N+1, the old primary's action container still sees the old N-entry list. All iteration points in switchover.sh then miss the freshly added candidate:

  • set_priorities_with_candidate_bias() — does not set replica-priority=1 on the fresh candidate
  • restore_priorities() — does not restore on the fresh candidate
  • wait_for_new_master() — never probes the fresh candidate, so it cannot observe role:master even after Sentinel promotion
  • check_* helpers using the same list

Fix

Introduce pod_fqdns_with_candidate() that unions KB_SWITCHOVER_CANDIDATE_FQDN (passed at action time as expected_fqdn / candidate_fqdn) into the env list. All iteration points are switched to consume the union list.

Validation

  • ShellSpec: 55 examples, 0 failures (scripts-ut-spec/valkey_switchover_spec.sh), with new cases covering stale-list scenarios.
  • Live broader smoke test (143 PASS / 4 FAIL / 2 SKIP, the 4 fails are non-product environment/capability gaps): T09 fresh scale-out targeted switchover one-shot pass, T14 targeted switchover Ops Succeed with candidate becoming primary, T15 sentinel failover normal.
  • Live chaos suite 143 PASS / 0 FAIL / 0 SKIP covering master kill, all-sentinel kill, all 6 pods kill, rapid master kill, restart, scale-out/in during writes, vscale during writes — fix holds under concurrent writes and chaos.

Same-pattern risk in other addons

Redis (addons/redis/scripts/redis-switchover.sh) follows the identical pattern with REDIS_POD_FQDN_LIST and SENTINEL_POD_FQDN_LIST injected via componentVarRef.podFQDNs. The same iteration points (set_redis_priorities, recover_redis_priorities, check_redis_kernel_status, check_switchover_result) carry the same architectural risk. This PR does not modify Redis — left for a follow-up evaluation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions