Skip to content

Conversation

maxlepikhin
Copy link

Description

Reduce poll interval from 120s to 20s.

Issues Resolved

Slow cluster start-up time.

Check List

  • [x ] Commits are signed per the DCO using --signoff
  • Unittest added for the new/changed functionality and all unit tests are successful
  • Customer-visible features documented
  • No linter warnings (make lint)

If CRDs are changed:

  • CRD YAMLs updated (make manifests) and also copied into the helm chart
  • Changes to CRDs documented

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@synhershko
Copy link

LGTM - tested and it indeed a necessary fix

until curl -k --silent https://%s:%v;
do
echo 'Waiting to connect to the cluster'; sleep 120;
echo 'Waiting to connect to the cluster'; sleep 20;
Copy link
Member

@prudhvigodithi prudhvigodithi Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added as part of https://github.com/opensearch-project/opensearch-k8s-operator/pull/198/files#diff-3f3f25087560ff69bb8867115997c9c8a5764ce6eebc95577072ad615051db3bR750, initially when I tested with EKS using EBS the security config pod failed because the cluster took time to start. This was when OpenSearch 2.0.0 was released.

Can we have any better way here, like using some cluster health API to poll if the cluster is fully ready connected with all nodes and then run the security config (or leave it as it is?) ?
@maxlepikhin @rursprung @rootxrishabh @synhershko

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not using this operator, but: if your cluster has the security plugin installed then you could poll the health endpoint on /_plugins/_security/health? this does not need authentication and will return HTTP 200 if it's alive

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is not ideal, but the 120 seconds wait doesn't make sense. @prudhvigodithi Let's get this merged and can you open an issue to discuss the right way to perform readiness checks for all possible scenarios?

Signed-off-by: Max Lepikhin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants