Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator making network requests to unmanaged resource cluster #8325

Open
sjiekak opened this issue Dec 12, 2024 · 2 comments
Open

operator making network requests to unmanaged resource cluster #8325

sjiekak opened this issue Dec 12, 2024 · 2 comments
Labels

Comments

@sjiekak
Copy link

sjiekak commented Dec 12, 2024

When we exclude an Elastic resource from being managed by the operator using the eck.k8s.elastic.co/managed=false annotation (documentation), the expectation is the different controllers will totally ignore the cluster.

What I have observed is the cluster still sends network requests to unmanaged clusters

Why is it an issue in our use case ?

We mark clusters as unmanaged when we want to scale down pods without data loss. Requests made by the operator to the cluster elasticsearch services generates a lot of ICMP denials

GET /_cluster/health HTTP/1.1
  Host: elasticsearch-es-master-0.xxxxx:9200
  User-Agent: Go-http-client/1.1
  Authorization: Basic `elastic-internal:hidden`
  Content-Type: application/json; charset=utf-8
  X-Elastic-Product-Origin: cloud
  Accept-Encoding: gzip
> Response 424

at the network layer

Internet Control Message Protocl
    Type: 3 (Destination unreachable)
    Code: 3 (Port unreachable)
    Checksum: 0x8a81 [correct]
    [Checksum status good]
....
@botelastic botelastic bot added the triage label Dec 12, 2024
@barkbay
Copy link
Contributor

barkbay commented Dec 16, 2024

While I tend to agree with your expectation I'm a bit curious about your use case:

  • Could you explain why do you need to pause reconciliations in order to scale down "without data loss"?
  • Why the destination host (Elasticsearch?) is unreachable during this period?

@sjiekak
Copy link
Author

sjiekak commented Dec 16, 2024

Hi @barkbay.

Could you explain why do you need to pause reconciliations in order to scale down "without data loss"

We need to scale down "without data loss". This is possible without pausing reconciliation. We can scale the es statefulset created by the elasticsearch.k8s.elastic.co/v1.ElasticSearch down to zero. This is why elasticsearch is unreachable

We need to pause reconciliation as otherwise the cloud-on-k8s controllers will perform network requests on the cluster.
We have also noticed the operator using more CPU as it still considered the resource under its management (was probably reconciling much more often, this is a guess)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants