You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context and scope
Currently the health-check service endpoint fatals when it's unhealthy. Since this is an external endpoint intended to be used by Kubernetes or another monitor we should let the caller decide what to do and for how long to wait when the service is unhealthy.
As part of this we should make sure that all of the places where we set the state to unhealthy is actually recoverable. If it's not we can fatal through a different mechanism than the external healthcheck endpoint.
Discussion and alternatives
#579 changed the handling for network exceptions to attempt to reconnect up to max tries before marking itself unhealthy because of the fatal behavior. If we go ahead with removing fatals, we should revert this to mark unhealthy as soon as it is and to attempt reconnecting with backoffs. The caller can then decide when to kill/restart the service.
Open questions
The text was updated successfully, but these errors were encountered:
Context and scope
Currently the health-check service endpoint fatals when it's unhealthy. Since this is an external endpoint intended to be used by Kubernetes or another monitor we should let the caller decide what to do and for how long to wait when the service is unhealthy.
As part of this we should make sure that all of the places where we set the state to unhealthy is actually recoverable. If it's not we can fatal through a different mechanism than the external healthcheck endpoint.
Discussion and alternatives
#579 changed the handling for network exceptions to attempt to reconnect up to max tries before marking itself unhealthy because of the fatal behavior. If we go ahead with removing fatals, we should revert this to mark unhealthy as soon as it is and to attempt reconnecting with backoffs. The caller can then decide when to kill/restart the service.
Open questions
The text was updated successfully, but these errors were encountered: