Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement health checks and graceful termination #1

Merged
merged 27 commits into from
Mar 8, 2024
Merged

Implement health checks and graceful termination #1

merged 27 commits into from
Mar 8, 2024

Conversation

omus
Copy link
Member

@omus omus commented Mar 5, 2024

Initial implementation of K8sDeputy.jl

@omus omus requested a review from kleinschmidt March 5, 2024 20:46
@omus
Copy link
Member Author

omus commented Mar 5, 2024

I need to add a license yet

Update: done

@omus omus marked this pull request as ready for review March 5, 2024 21:19
@omus omus requested a review from glennmoy March 6, 2024 15:52
docs/src/graceful_termination.md Show resolved Hide resolved
docs/src/graceful_termination.md Outdated Show resolved Hide resolved
docs/src/graceful_termination.md Outdated Show resolved Hide resolved
docs/src/graceful_termination.md Outdated Show resolved Hide resolved
docs/src/health_checks.md Outdated Show resolved Hide resolved
src/graceful_termination.jl Outdated Show resolved Hide resolved
src/graceful_termination.jl Outdated Show resolved Hide resolved
src/graceful_termination.jl Outdated Show resolved Hide resolved
src/graceful_termination.jl Outdated Show resolved Hide resolved
test/health.jl Show resolved Hide resolved
src/graceful_termination.jl Outdated Show resolved Hide resolved
@omus omus requested a review from glennmoy March 8, 2024 16:25
Copy link
Contributor

@glennmoy glennmoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just for my own understanding, the TERM actually triggerd by K8s isnt' actually propagated by the Deputy? it just knows it has to shut down gracefully?

src/health.jl Show resolved Hide resolved
src/graceful_termination.jl Show resolved Hide resolved
Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly reviewed the docs, they look good and address my concerns from before!

docs/src/health_checks.md Outdated Show resolved Hide resolved

!!!note

K8s probes require that applications must respond to the probe requests in under `timeoutSeconds` (defaults to 1 second). Since Julia's HTTP.jl server can be unresponsive we recommend using a `timeoutSeconds` of at least 5 seconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the server take so long to respond?! compilation or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's just delays with Julia running multiple threads. I didn't dig into this too much but I have witnessed the problem on a multi-node cluster.

- name: app
ports:
- name: health-check
containerPort: 8081 # The default K8sDeputy.jl heath check port
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this configurable, using an ENV?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is configurable via the HEALTH_CHECK_PORT environmental variable. For the documentation this was a bit cleaner as I didn't have to also introduce the env block

docs/src/health_checks.md Show resolved Hide resolved
src/graceful_termination.jl Show resolved Hide resolved
@omus
Copy link
Member Author

omus commented Mar 8, 2024

Just for my own understanding, the TERM actually triggerd by K8s isnt' actually propagated by the Deputy? it just knows it has to shut down gracefully?

We initiate a graceful shutdown via the preStop and our recommended setup results in the Julia process terminating before the TERM signal is sent to the container.

@omus omus merged commit e0f8b13 into main Mar 8, 2024
4 checks passed
@omus omus deleted the cv/initial branch March 8, 2024 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants