Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase machine health check node unready timeout to 15m #3133

Merged
merged 3 commits into from
Aug 31, 2023

Conversation

s-amann
Copy link
Contributor

@s-amann s-amann commented Aug 31, 2023

Which issue this PR addresses:

Fixes https://issues.redhat.com/browse/ARO-4040

What this PR does / why we need it:

Test plan for issue:

Is there any documentation that needs to be updated for this PR?

Copy link
Collaborator

@bennerv bennerv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

The 15m change is due to customers having issues during upgrade where nodes would be NotReady while they are rebooting and running rpm-ostree commands to upgrade to the latest RHCOS versions.

MHC was killing worker nodes causing some disruption during upgrades. This should prevent that. It will "negatively" impact the removal of NotReady nodes in a sense they will take longer to replace.

We should confirm the Node Not Ready monitor shouldn't fire within this 15m timeframe, but a bit after (30m or so).

@s-amann s-amann added ready-for-review next-release To be included in the next RP release rollout chainsaw Pull requests or issues owned by Team Chainsaw labels Aug 31, 2023
@s-amann
Copy link
Contributor Author

s-amann commented Aug 31, 2023

/azp run ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@s-amann
Copy link
Contributor Author

s-amann commented Aug 31, 2023

/azp run e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bennerv bennerv merged commit 9f7ef79 into Azure:master Aug 31, 2023
18 checks passed
@s-amann s-amann deleted the increase-mhc branch September 1, 2023 12:43
SrinivasAtmakuri pushed a commit to SrinivasAtmakuri/ARO-RP that referenced this pull request Sep 18, 2023
* increase machine health check node unready timeout to 15m

* update mhc docs

* increase machine health check node startup timeout to 25m
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chainsaw Pull requests or issues owned by Team Chainsaw next-release To be included in the next RP release rollout ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants