Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Worker Health Checks #289

Merged
merged 9 commits into from
Dec 30, 2024
Merged

Feat: Worker Health Checks #289

merged 9 commits into from
Dec 30, 2024

Conversation

hatchet-temporary
Copy link
Contributor

@hatchet-temporary hatchet-temporary commented Dec 20, 2024

Adding a simple server to handle worker health checks using aiohttp running on a customizable port w/ a default.

Example requests:

/health (basic usage)

hatchet-sdk-py3.10matt@Mac hatchet-python % curl localhost:8001/health 
{"status": "HEALTHY"}%   

/metrics (Prometheus)

hatchet-sdk-py3.10matt@Mac hatchet-python % curl localhost:8001/metrics
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 18641.0
python_gc_objects_collected_total{generation="1"} 4788.0
python_gc_objects_collected_total{generation="2"} 215.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 308.0
python_gc_collections_total{generation="1"} 27.0
python_gc_collections_total{generation="2"} 2.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="10",patchlevel="15",version="3.10.15"} 1.0
# HELP hatchet_worker_status Current status of the Hatchet worker
# TYPE hatchet_worker_status gauge
hatchet_worker_status 0.0

Custom port:

hatchet-sdk-py3.10matt@Mac hatchet-python % HATCHET_CLIENT_WORKER_HEALTHCHECK_PORT=1234 pr simple
[DEBUG] 🪓 -- 2024-12-20 12:57:44,666 - creating new event loop
[INFO]  🪓 -- 2024-12-20 12:57:44,666 - ------------------------------------------
[INFO]  🪓 -- 2024-12-20 12:57:44,666 - STARTING HATCHET...
[DEBUG] 🪓 -- 2024-12-20 12:57:44,666 - worker runtime starting on PID: 15420
[INFO]  🪓 -- 2024-12-20 12:57:44,666 - healthcheck server running on port 1234

Already taken port:

hatchet-sdk-py3.10matt@Mac hatchet-python % HATCHET_CLIENT_WORKER_HEALTHCHECK_PORT=8080 pr simple
[DEBUG] 🪓 -- 2024-12-20 12:58:36,081 - creating new event loop
[INFO]  🪓 -- 2024-12-20 12:58:36,081 - ------------------------------------------
[INFO]  🪓 -- 2024-12-20 12:58:36,081 - STARTING HATCHET...
[DEBUG] 🪓 -- 2024-12-20 12:58:36,081 - worker runtime starting on PID: 15639
[ERROR] 🪓 -- 2024-12-20 12:58:36,081 - failed to start healthcheck server
[ERROR] 🪓 -- 2024-12-20 12:58:36,081 - [Errno 48] error while attempting to bind on address ('0.0.0.0', 8080): address already in use
[DEBUG] 🪓 -- 2024-12-20 12:58:36,083 - action listener starting on PID: 15650
[INFO]  🪓 -- 2024-12-20 12:58:36,085 - starting runner...

Example graph from local prometheus (starting up, shutting down, and starting up again):

Screenshot 2024-12-20 at 3 45 13 PM

hatchet_sdk/worker/worker.py Show resolved Hide resolved
hatchet_sdk/worker/worker.py Outdated Show resolved Hide resolved
@mrkaye97 mrkaye97 merged commit 56fde44 into main Dec 30, 2024
6 checks passed
@mrkaye97 mrkaye97 deleted the worker-health-check branch December 30, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants