Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Adjustable check frequency upon failed requests #298

Open
peter-dolkens opened this issue Mar 22, 2023 · 2 comments
Open

Adjustable check frequency upon failed requests #298

peter-dolkens opened this issue Mar 22, 2023 · 2 comments
Labels
alerting Triggering and sending alerts canny

Comments

@peter-dolkens
Copy link

peter-dolkens commented Mar 22, 2023

💡 Join our Slack Community to ask general questions, suggest ideas and get direct help from all the folks at Checkly.

Is your feature request related to a problem? Please describe.
Some services are impacted by intermittent/transient disruptions which last longer than the inbuilt retry policy, but are often resolved before notifications reach our team.

We would like a way to distinguish an ongoing outage, vs a transient error.

The current auto-retry policy makes a second attempt too soon after the first attempt, and is often impacted by the same transient error.

This feature would potentially also allow more accurate tracking of time-based metrics, as check frequency could be increased during outages to get more accurate duration information.

Describe the solution you'd like
I'd like to be able to have 2 distinct check rates - one for when everything is working as expected, and a separate one for when the check is in a fail state.

Describe alternatives you've considered

  • We've considered using the API to manually adjust the check frequency during an outage; this seems heavy handed at this stage.
  • We've considered writing retry logic into our tests; This would throw out performance metrics as tests could take wildly different amount of time.
  • One alternative would be a way to adjust the amount of time before the automatic retry occurs; This would be the MVP for this feature, but it would be far more powerful as a distinct rate for any check in a fail state.
  • Another alternative would be a way to adjust the number of failures that must occur before a check is considered failed; This is already partially implemented by the "Double check on failure" feature, but there's no way to configure this to wait X seconds before performing the second check.

Additional context

  • Ideally, any change in frequency should impact metrics (availability/p95/etc) in a time-based fashion, not a count-based fashion.
  • This feature request also opens the pathway to "retry policies", which would allow the user to either choose from a defined set of policies (exponential / linear / constant) or set their own policies, which would allow you to adjust the duration between consecutive retry attempts. This could even go so far as to allow the user to specify how many checks must fail before a test is marked "failed" vs "degraded"
@peter-dolkens
Copy link
Author

CC @ebuna as this was your idea

@tnolet tnolet mentioned this issue Mar 24, 2023
@tnolet
Copy link
Member

tnolet commented Mar 24, 2023

@peter-dolkens we are tackling this in a larger Alerting V2 project later this year. I'm adding this ticket to the overarching one https://github.com/orgs/checkly/projects/4/views/4?pane=issue&itemId=21238722

On the specific topics you mentioned: the all make sense to me. Please give this one also an upvote as I think it covers a lot of what you mentioned #208

@tnolet tnolet added the alerting Triggering and sending alerts label Mar 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
alerting Triggering and sending alerts canny
Projects
None yet
Development

No branches or pull requests

3 participants