You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.
💡 Join our Slack Community to ask general questions, suggest ideas and get direct help from all the folks at Checkly.
Is your feature request related to a problem? Please describe.
Some services are impacted by intermittent/transient disruptions which last longer than the inbuilt retry policy, but are often resolved before notifications reach our team.
We would like a way to distinguish an ongoing outage, vs a transient error.
The current auto-retry policy makes a second attempt too soon after the first attempt, and is often impacted by the same transient error.
This feature would potentially also allow more accurate tracking of time-based metrics, as check frequency could be increased during outages to get more accurate duration information.
Describe the solution you'd like
I'd like to be able to have 2 distinct check rates - one for when everything is working as expected, and a separate one for when the check is in a fail state.
Describe alternatives you've considered
We've considered using the API to manually adjust the check frequency during an outage; this seems heavy handed at this stage.
We've considered writing retry logic into our tests; This would throw out performance metrics as tests could take wildly different amount of time.
One alternative would be a way to adjust the amount of time before the automatic retry occurs; This would be the MVP for this feature, but it would be far more powerful as a distinct rate for any check in a fail state.
Another alternative would be a way to adjust the number of failures that must occur before a check is considered failed; This is already partially implemented by the "Double check on failure" feature, but there's no way to configure this to wait X seconds before performing the second check.
Additional context
Ideally, any change in frequency should impact metrics (availability/p95/etc) in a time-based fashion, not a count-based fashion.
This feature request also opens the pathway to "retry policies", which would allow the user to either choose from a defined set of policies (exponential / linear / constant) or set their own policies, which would allow you to adjust the duration between consecutive retry attempts. This could even go so far as to allow the user to specify how many checks must fail before a test is marked "failed" vs "degraded"
The text was updated successfully, but these errors were encountered:
On the specific topics you mentioned: the all make sense to me. Please give this one also an upvote as I think it covers a lot of what you mentioned #208
💡 Join our Slack Community to ask general questions, suggest ideas and get direct help from all the folks at Checkly.
Is your feature request related to a problem? Please describe.
Some services are impacted by intermittent/transient disruptions which last longer than the inbuilt retry policy, but are often resolved before notifications reach our team.
We would like a way to distinguish an ongoing outage, vs a transient error.
The current auto-retry policy makes a second attempt too soon after the first attempt, and is often impacted by the same transient error.
This feature would potentially also allow more accurate tracking of time-based metrics, as check frequency could be increased during outages to get more accurate duration information.
Describe the solution you'd like
I'd like to be able to have 2 distinct check rates - one for when everything is working as expected, and a separate one for when the check is in a fail state.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: