Skip to content

[elasticsearch] HorizontalScaling In: member-leave stuck when shard awareness blocks relocation #2563

@nayutah

Description

@nayutah

Summary

HorizontalScaling In for the elasticsearch data component hangs indefinitely when ES shard awareness (k8s_node_name) blocks shard relocation.

Trigger Path

  1. Deploy elasticsearch m-d-i-t topology with 3 data nodes
  2. Scale data from 3 → 2 replicas via HorizontalScaling In OpsRequest
  3. member-leave.sh sets cluster.routing.allocation.exclude._name: data-2
  4. ES refuses to relocate shards: data-0 and data-2 are on the same K8s node; data-1 already has the replica; awareness constraint blocks relocation to data-0 (same k8s node as the excluded node)
  5. member-leave.sh waits up to MAX_WAIT_TIME=1800s; OpsRequest stays Running forever

Root Cause

The member-leave.sh script drains shards by setting the exclude routing setting, but does not account for the case where ES shard awareness (cluster.routing.allocation.awareness.attributes: k8s_node_name) prevents relocation. When only 2 K8s nodes are available and data nodes are co-located, the awareness constraint makes it impossible to drain shards from the excluded node.

ES allocation explain output:

node: elasticsea-m-d-i-t-data-0 decision: no
  NO because: awareness there are [2] copies of this shard and [2] values for attribute [k8s_node_name]
node: elasticsea-m-d-i-t-data-1 decision: no
  NO because: same_shard a copy of this shard is already allocated to this node

Fix

member-leave.sh should detect when shard relocation is blocked (by checking _cluster/allocation/explain) and either:

  1. Temporarily disable awareness constraints during scale-in
  2. Log a warning and exit 0 after checking that the shard is replicated elsewhere
  3. Raise an error to let the user know scale-in is blocked by cluster topology

Workaround

Manually clear the ES exclusion after cancelling the OpsRequest:

curl -X PUT http://localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{"persistent":{"cluster.routing.allocation.exclude._name":null}}'
kubectl delete opsrequest <hscale-in-op-name>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions