Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve shutdown logic: Wait until no requests are made #12397

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

motoki317
Copy link

@motoki317 motoki317 commented Nov 21, 2024

What this PR does / why we need it:

Hello!
This PR mainly introduces 3 changes:

  • Improves shutdown logic of the controller.
    • The controller now waits until no more requests are seen to the nginx process before exiting. This is done by polling /nginx_status (stub_status) page, and see if the requests handled number goes up.
    • resolves Zero downtime upgrade #6928.
  • Changes the grace-shutdown-period in chart values to a more reasonable default.
    • Kubernetes pods are terminated (sent a SIGTERM) and removed from Service endpoints in parallel, so pods are expected to wait a bit before stopping to accept new connections from the upstream - this is the very reason this PR exists. It is generally a good idea in Kubernetes to wait a little bit after receiving a SIGTERM signal before exiting - I have changed the default wait time from 0s to 10s.
    • resolves Zero downtime upgrade #6928 along with the above change.
  • Removes the unnecessary (verbose) wait-shutdown binary and preStop hook.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • CVE Report (Scanner found CVE and adding report)
  • Breaking change (fix or feature that would cause existing functionality to change)
    • I should note that: this PR technically introduces a breaking change to the default configuration values, but shouldn't be much of a problem to the users.
  • Documentation only

Which issue/s this PR fixes

fixes #6928
fixes #6287

How Has This Been Tested?

  • I have built docker image on this branch and manually tested the improved shutdown logic feature.
  • I have added one E2E test spec to verify that the improved shutdown logic indeed works. I have run all E2E tests locally and verified that all tests pass.

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have added unit and/or e2e tests to cover my changes.
  • All new and existing tests passed.

Copy link

linux-foundation-easycla bot commented Nov 21, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. area/helm Issues or PRs related to helm charts labels Nov 21, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Welcome @motoki317!

It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/ingress-nginx has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Nov 21, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @motoki317. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority labels Nov 21, 2024
Copy link

netlify bot commented Nov 21, 2024

Deploy Preview for kubernetes-ingress-nginx canceled.

Name Link
🔨 Latest commit c60b984
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-ingress-nginx/deploys/6743df46cc68d900080a304a

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 21, 2024
@motoki317 motoki317 force-pushed the feat/improve-pod-shutdown branch 2 times, most recently from 668e84e to 60f8c04 Compare November 21, 2024 13:21
@adrianmoisey
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 22, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: motoki317
Once this PR has been reviewed and has the lgtm label, please ask for approval from rikatz. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@Gacko Gacko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change the manifests in deploy/static/. This is part of the release process.

Pods in Kubernetes endpoints are expected to shut-down 'gracefully' after receiving SIGTERM -
we should keep accepting new connections for a while. This is because Kubernetes updates Service endpoints
and sends SIGTERM to pods *in parallel*.

See kubernetes/kubernetes#106476 for more detail.
Note that post-shutdown-grace-period doesn't seem to contribute to graceful shutdown,
see kubernetes#8095 for discussion.
/wait-shutdown preStop script's only job is to send SIGTERM to nginx-ingress-controller,
which is PID 1, so it's the same with or without in Kubernetes environments.

See kubernetes#6287 for discussion.
@motoki317
Copy link
Author

@Gacko I see, I've reverted the changes there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm Issues or PRs related to helm charts cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero downtime upgrade Distinguish wait-shutdown command from standard k8s SIGTERM
5 participants