increase machine health check node unready timeout to 15m #3133

s-amann · 2023-08-31T14:38:32Z

Which issue this PR addresses:

Fixes https://issues.redhat.com/browse/ARO-4040

What this PR does / why we need it:

Test plan for issue:

Is there any documentation that needs to be updated for this PR?

bennerv

lgtm

The 15m change is due to customers having issues during upgrade where nodes would be NotReady while they are rebooting and running rpm-ostree commands to upgrade to the latest RHCOS versions.

MHC was killing worker nodes causing some disruption during upgrades. This should prevent that. It will "negatively" impact the removal of NotReady nodes in a sense they will take longer to replace.

We should confirm the Node Not Ready monitor shouldn't fire within this 15m timeframe, but a bit after (30m or so).

s-amann · 2023-08-31T20:40:04Z

/azp run ci

azure-pipelines · 2023-08-31T20:40:14Z

Azure Pipelines successfully started running 1 pipeline(s).

s-amann · 2023-08-31T20:40:31Z

/azp run e2e

azure-pipelines · 2023-08-31T20:40:40Z

Azure Pipelines successfully started running 1 pipeline(s).

* increase machine health check node unready timeout to 15m * update mhc docs * increase machine health check node startup timeout to 25m

increase machine health check node unready timeout to 15m

6a03e55

s-amann requested review from jewzaam, bennerv, hawkowl, rogbas, petrkotas, jharrington22, cblecker, facchettos, cadenmarchese, UlrichSchlueter, SudoBrendan, Shivkumar13, yjst2012, anshulvermapatel and hlipsig as code owners August 31, 2023 14:38

s-amann added 2 commits August 31, 2023 10:40

update mhc docs

81a290b

increase machine health check node startup timeout to 25m

43b50d0

bennerv approved these changes Aug 31, 2023

View reviewed changes

cadenmarchese approved these changes Aug 31, 2023

View reviewed changes

s-amann added ready-for-review next-release To be included in the next RP release rollout chainsaw Pull requests or issues owned by Team Chainsaw labels Aug 31, 2023

bennerv merged commit 9f7ef79 into Azure:master Aug 31, 2023
18 checks passed

s-amann deleted the increase-mhc branch September 1, 2023 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

increase machine health check node unready timeout to 15m #3133

increase machine health check node unready timeout to 15m #3133

s-amann commented Aug 31, 2023 •

edited

Loading

bennerv left a comment

s-amann commented Aug 31, 2023

azure-pipelines bot commented Aug 31, 2023

s-amann commented Aug 31, 2023

azure-pipelines bot commented Aug 31, 2023

increase machine health check node unready timeout to 15m #3133

increase machine health check node unready timeout to 15m #3133

Conversation

s-amann commented Aug 31, 2023 • edited Loading

Which issue this PR addresses:

What this PR does / why we need it:

Test plan for issue:

Is there any documentation that needs to be updated for this PR?

bennerv left a comment

Choose a reason for hiding this comment

s-amann commented Aug 31, 2023

azure-pipelines bot commented Aug 31, 2023

s-amann commented Aug 31, 2023

azure-pipelines bot commented Aug 31, 2023

s-amann commented Aug 31, 2023 •

edited

Loading