Adjustable check frequency upon failed requests #298

peter-dolkens · 2023-03-22T10:37:41Z

💡 Join our Slack Community to ask general questions, suggest ideas and get direct help from all the folks at Checkly.

Is your feature request related to a problem? Please describe.
Some services are impacted by intermittent/transient disruptions which last longer than the inbuilt retry policy, but are often resolved before notifications reach our team.

We would like a way to distinguish an ongoing outage, vs a transient error.

The current auto-retry policy makes a second attempt too soon after the first attempt, and is often impacted by the same transient error.

This feature would potentially also allow more accurate tracking of time-based metrics, as check frequency could be increased during outages to get more accurate duration information.

Describe the solution you'd like
I'd like to be able to have 2 distinct check rates - one for when everything is working as expected, and a separate one for when the check is in a fail state.

Describe alternatives you've considered

We've considered using the API to manually adjust the check frequency during an outage; this seems heavy handed at this stage.
We've considered writing retry logic into our tests; This would throw out performance metrics as tests could take wildly different amount of time.
One alternative would be a way to adjust the amount of time before the automatic retry occurs; This would be the MVP for this feature, but it would be far more powerful as a distinct rate for any check in a fail state.
Another alternative would be a way to adjust the number of failures that must occur before a check is considered failed; This is already partially implemented by the "Double check on failure" feature, but there's no way to configure this to wait X seconds before performing the second check.

Additional context

Ideally, any change in frequency should impact metrics (availability/p95/etc) in a time-based fashion, not a count-based fashion.
This feature request also opens the pathway to "retry policies", which would allow the user to either choose from a defined set of policies (exponential / linear / constant) or set their own policies, which would allow you to adjust the duration between consecutive retry attempts. This could even go so far as to allow the user to specify how many checks must fail before a test is marked "failed" vs "degraded"

peter-dolkens · 2023-03-22T10:38:13Z

CC @ebuna as this was your idea

tnolet · 2023-03-24T15:56:42Z

@peter-dolkens we are tackling this in a larger Alerting V2 project later this year. I'm adding this ticket to the overarching one https://github.com/orgs/checkly/projects/4/views/4?pane=issue&itemId=21238722

On the specific topics you mentioned: the all make sense to me. Please give this one also an upvote as I think it covers a lot of what you mentioned #208

tnolet mentioned this issue Mar 24, 2023

Alerting V2 #288

Open

tnolet added the alerting Triggering and sending alerts label Mar 24, 2023

drakirnosslin added the canny label May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjustable check frequency upon failed requests #298

Adjustable check frequency upon failed requests #298

peter-dolkens commented Mar 22, 2023 •

edited

Loading

peter-dolkens commented Mar 22, 2023

tnolet commented Mar 24, 2023

Adjustable check frequency upon failed requests #298

Adjustable check frequency upon failed requests #298

Comments

peter-dolkens commented Mar 22, 2023 • edited Loading

peter-dolkens commented Mar 22, 2023

tnolet commented Mar 24, 2023

peter-dolkens commented Mar 22, 2023 •

edited

Loading