operator making network requests to unmanaged resource cluster #8325

sjiekak · 2024-12-12T13:23:04Z

When we exclude an Elastic resource from being managed by the operator using the eck.k8s.elastic.co/managed=false annotation (documentation), the expectation is the different controllers will totally ignore the cluster.

What I have observed is the cluster still sends network requests to unmanaged clusters

it seems the observer is only stopped on cluster deletion
the license controller reconciles unmanaged clusters

Why is it an issue in our use case ?

We mark clusters as unmanaged when we want to scale down pods without data loss. Requests made by the operator to the cluster elasticsearch services generates a lot of ICMP denials

GET /_cluster/health HTTP/1.1
  Host: elasticsearch-es-master-0.xxxxx:9200
  User-Agent: Go-http-client/1.1
  Authorization: Basic `elastic-internal:hidden`
  Content-Type: application/json; charset=utf-8
  X-Elastic-Product-Origin: cloud
  Accept-Encoding: gzip
> Response 424

at the network layer

Internet Control Message Protocl
    Type: 3 (Destination unreachable)
    Code: 3 (Port unreachable)
    Checksum: 0x8a81 [correct]
    [Checksum status good]
....

The text was updated successfully, but these errors were encountered:

barkbay · 2024-12-16T13:30:38Z

While I tend to agree with your expectation I'm a bit curious about your use case:

Could you explain why do you need to pause reconciliations in order to scale down "without data loss"?
Why the destination host (Elasticsearch?) is unreachable during this period?

sjiekak · 2024-12-16T16:48:20Z

Hi @barkbay.

Could you explain why do you need to pause reconciliations in order to scale down "without data loss"

We need to scale down "without data loss". This is possible without pausing reconciliation. We can scale the es statefulset created by the elasticsearch.k8s.elastic.co/v1.ElasticSearch down to zero. This is why elasticsearch is unreachable

We need to pause reconciliation as otherwise the cloud-on-k8s controllers will perform network requests on the cluster.
We have also noticed the operator using more CPU as it still considered the resource under its management (was probably reconciling much more often, this is a guess)

botelastic bot added the triage label Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator making network requests to unmanaged resource cluster #8325

operator making network requests to unmanaged resource cluster #8325

sjiekak commented Dec 12, 2024

barkbay commented Dec 16, 2024

sjiekak commented Dec 16, 2024 •

edited

Loading

operator making network requests to unmanaged resource cluster #8325

operator making network requests to unmanaged resource cluster #8325

Comments

sjiekak commented Dec 12, 2024

Why is it an issue in our use case ?

barkbay commented Dec 16, 2024

sjiekak commented Dec 16, 2024 • edited Loading

sjiekak commented Dec 16, 2024 •

edited

Loading