-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancel pushover notifications when the alert resolves itself #4194
base: main
Are you sure you want to change the base?
Cancel pushover notifications when the alert resolves itself #4194
Conversation
Signed-off-by: Joel Baranick <[email protected]>
…cable Signed-off-by: Joel Baranick <[email protected]>
a17cd5f
to
7b2743b
Compare
Hi, since you are sending a Though Pushover already has a concept of group keys for delivery groups which I'm assuming are different from what this group key means in Prometheus, so prefixing the tag with |
…oper Signed-off-by: Joel Baranick <[email protected]>
@jcs Thanks for the feedback. I've changed it to |
Ugh. I realize one of the issues here is that the |
Not only that but it would add more strain to the pushover API server, doesn't seem efficient. |
Agreed. Hopefully @jcs can shed some more light on that as Alertmanager should continue to be a good steward of Pushover's resources |
These requests would not count against the application's quota, though I agree that it would be best to avoid sending them unnecessarily. |
That is good news about the quota, but I don't see a way to not send these short of storing state for each alert. @SuperQ Can you chime in here? |
@jcs Pending feedback from Prometheus developers, what is your feeling of this feature moving forward if keeping state is not permitted? |
At least in the configs that I have seen, it is common that the priority is a template. One example is having a template which sets the priority to Alternatively we could introduce a config option on the receiver which would enable cancellation support. Users could configure alertmanager to have separate receiver which only handles those alerts which are critical. That receiver could still only include tags on messages with priority |
Signed-off-by: Joel Baranick <[email protected]>
I think I can add some clarification here. Alertmanager does not officially support alert prioritization. Instead, it just has opaque key/value pairs via annotations and labels. As such, there is not way to really determine the priority of an alert, if that's what you would like to do. For example, some users might use a "priority" annotation, but other users might use an annotation with a completely different name. But importantly, Alertmanager does not reserve names for such things. Priorities are only a feature of some integrations, such as Pushover. However, they are more of an output rather than an input. I.e. "given this set of alerts, what is the priority of this notification?". The nice thing about this is that it lets users use templates to implement their own logic to decide if a notification is a priority or not. |
I'm not sure I understand why you need to store state though? |
@grobinson-grafana Thanks for including that information to ensure that we are all on the same page in this discussion. As for storing state, it was mentioned in the context of the prior statements. Let me recap. Pushover supports messages having priority. Priority can be -2 to 2. Priority of 2 indicates an emergency and affects how Pushover handles the message, with one of the behaviors being it will keep alerting you unless you acknowledge the message. The current problem is that Pushover continues to notify the user and requires message acknowledgement after Alertmanager resolves an alert. Pushover developers added a way to ”cancel” priority 2 messages, the cancel_by_tag api. Pushover protects their system by enforcing message quotas. Often, each Alertmanager alert ends up sending two messages to Pushover: one for firing and one for resolution. Adding a cancel_by_tag call would increase the number of calls that the Pushover systems need to handle. In order to be a good citizen, Alertmanager shouldn’t overload the Pushover system. Ideally, we would know that an alert was fired with a priority of 2 and only call cancel_by_tag when those alerts are resolved. Because priority is a template we may not know in resolution what the priority was. So, we have a few choices:
I would like to determine a solution that is acceptable to both the Alertmanager and Pushover maintainers. |
But don't you know that from the |
@grobinson-grafana The priority of the resolution message itself is immaterial. If you choose to send the resolution message with priority=2, that is on you. But the initially fired message with priority=2 will continue to require acknowledgment unless cancel_by_tag is called. This is all about improving the common workflow for people that use Pushover for notification. |
Often resolution messages will have lower priority than firing messages. The priority dictates how the app responds. If the priority is 2, the app will require acknowledgement, ignores quiet hours (configurable), triggers a critical alert (iOS, configurable), etc. It would be a bad experience if resolution messages would alert without respect for quiet hous. |
Much of this was discussed in: #634 |
That's what I thought as well. But then I don't understand. Previously you mentioned you need to store state in order to "remember" if you previously sent a high priority alert notification to Pushover, but here you say the priority is immaterial. Did I completely misunderstand? Is the priority of the resolution notification not supposed to match the priority of the previous firing notifications? |
No, the priority of resolution message does not need to match the firing message, and likely you don't want it to. I wouldn't want my phone to wake me up in the middle of the night because an issue was resolved. |
As I mentioned before, this is all in the scope of reducing load again Pushover's server, so that Alertmanager is not abusing it. And as I also mentioned, I see only 4 ways of accomplishing this:
|
Options 2 and Option 3 seems like the most sensible option to me. I still don't understand why you would need or want to store state for this, as priority should not be "flapping" in such a way it becomes completely non deterministic between two consecutive notifications for the same aggregation group. However, if you feel you really really need to store state, you can probably find a way to add metadata to nflog entries. Option 4 could work if the priority of a notification can be determined from a single alert. If a user's template looks at the entire aggregation group (i.e. if more than 2 firing alerts, set priority to 2) then it won't be possible. |
Regarding Option 1, the nflog.Entry currently doesn't contain receiver specific information. I assumed, maybe incorrectly, that Alertmanager would not want to be in the business of storing receiver specific information. Option 2 seemed like a non-starter as it would always increase the load on the Pushover servers and they would prefer we not do that. Option 3, seemed like a hack, but it could reduce the load on the Pushover server and still enabled even better behavior via a specific receiver for priority 2 messages. What you explained about option 4 makes sense. I don't think the priority of the alert is "flapping" in the way you mean. Here is an example template for priority: Given all this, if you think Option 3 will get merged, I am happy to remove the new config option I added and simply do a |
Signed-off-by: Joel Baranick <[email protected]>
Signed-off-by: Joel Baranick <[email protected]>
If I understand correctly to use this feature it will require people to set Maybe there could be some docs written around that. It might not be clear to people who already have |
@onedr0p Option 3, which is what is in the PR now, checks to see if the value |
It should be here https://github.com/prometheus/alertmanager/blob/main/nflog/nflogpb/nflog.proto#L29.
Is it possible to send just one message? The code right now sends two messages for a resolved notification. If the notification is a resolved message, can you just send shouldRetry, err := n.sendMessage(ctx, key, u, parameters)
// Notifications sent for firing alerts could be sent with emergency priority, but the resolution notifications
// might be sent with a different priority. Because of this, and the desire to reduce unnecessary cancel_by_tag
// call again Pushover, we only call cancel_by_tag if the priority config value contains.
if err == nil && strings.Contains(n.conf.Priority, emergencyPriority) && alerts.Status() == model.AlertResolved {
u, err = url.Parse(fmt.Sprintf("%s/%s.json", n.apiReceiptsURL, groupKeyTag))
if err != nil {
return false, err
}
shouldRetry, err = n.sendMessage(ctx, key, u, parameters)
}
return shouldRetry, err
Why do you think it's a hack?
For this specific template, that would be correct. I hope no one would write a template like that though. It should be more like |
I looked there, but there is not any receiver implementation specific data at the moment and I’m not sure there should be.
No. If you send only I think about it like email as it operates similarly. The original firing notification and the resolved notification are simply messages that show up on the users phone. Priority controls how important the message is and that alters how the app acts, such as only incrementing the badge count or sounding an alarm even though it is quiet hours.
Because it is just looking for whether the template string contains the number 2, rather than being able to deterministically know that the firing alert was sent with priority 2. That said, I am fine with this compromise as it reduces load on the Pushover servers and doesn’t introduce new receiver implementation specific data to the nflog.
People are definitely writing templates like this and there is a need. You do not want the resolution notification to bypass the user’s quiet hours. So sending the resolve notification with priority 2 doesn’t make a ton of sense. It would result in someone getting potentially woken up when there is no need. |
You would add support for metadata (i.e. key/value pairs) and then each receiver could store whatever metadata it needs. No field would be added that is specific to the receiver, but a generic field for any and all metadata could be accepted.
Is it possible for Pushover to add a new server method that does both? Then we don't need to send two messages. The other consideration is that this will only happen when all alerts in an aggregation group are resolved, right? Have you benchmarked it see what the load would actually be on a typical Alertmanager deployment? If not, perhaps try it? You may find it's not that bad.
One issue I realized with tracking is it won't be enough to just store the priority for the last sent notification. Instead, you will need to store the priority for all previously un-cancelled priority 2 notifications, right? For example:
In this example, there is an earlier priority 2 notification that has not been canceled.
Perhaps cancellation is not a feature we need? The implementation seems to have a lot of problems? Is there a lot of support from the community for this feature? |
Sure, if you wanted to go that route, great. But, it seems to add a bunch of complexity for what should be a Pushover specific change and quite limited in scope.
I'm sure it is possible, but I'm not a dev on that project. They have already added one method based on this issue, so I'm not sure if they will want to expend more effort.
This is not about the load on Alertmanager. It is about additional load on the Pushover servers. See the following comment from Joshua Stein: "These requests would not count against the application's quota, though I agree that it would be best to avoid sending them unnecessarily."
This is where I think a simple configuration option would almost be best. If the receiver had a
I and others would be able to use emergency priority alerts with Pushover with this feature in place. Currently we don't use it because of all the manual interaction needed to handle emergency priority alerts upon resolution. |
No I understand. I wasn't talking about load on Alertmanager either. My point was the condition needed for this second message to be sent only happens when all alerts in an aggregation group are resolved. Unless a user has completely disabled grouping, my suspicion is real world load increase on Pushover servers is far less than 2x, because most group flushes have at least 1 firing alert. A group flush will all resolved alerts is far less common.
I think that's a fine solution that we would accept (I can approve).
👍 |
@grobinson-grafana Great! That config option would let us do something like:
|
This reverts commit 009466d. Signed-off-by: Joel Baranick <[email protected]>
This reverts commit 68aa21a. Signed-off-by: Joel Baranick <[email protected]>
41c5f50
to
2f2cded
Compare
Isn't |
Sure The above templates allow Alertmanager to send a message to Pushover with priority=2 when a severity=critical alert is firing and have cancel_on_resolve=true for only severity=critical alerts. The key thing this gets us, with regard to Pushover, is that resolve messages are not emergency and |
@grobinson-grafana The PR has been updated based on our discussion |
@grobinson-grafana Anything else I need to do for this one? |
@grobinson-grafana All checks are passing now. Anything else I need to do to enable this to be merged? |
Pushover emergency priority (2) alerts will continue to require acknowledgement even when the alert is resolved in Alertmanager. Currently a resolution message is sent to Pushover but this does not cancel the emergency alert. To support cancellation of emergency alerts Pushover added the ability to attach tags to an alert and a new route (
https://api.pushover.net/1/receipts/cancel_by_tag/(your tag).json
) which can be used to cancel all emergency priority alerts which have the specified tag.To support this, the Pushover notifier will now add a tag to outgoing messages which emergency priority:
groupKey=<group key hash>
. When an alert is resolved the notifier will send the resolution message. When that is complete and the priority is2
, thecancel_by_tag
request will be sent:https://api.pushover.net/1/receipts/cancel_by_tag/groupKey=<group key hash>.json
@jcs Can you confirm that my usage of the
cancel_by_tag
API is correct. There is a little ambiguity in the docs as to whether the request should be formatted likehttps://api.pushover.net/1/receipts/cancel_by_tag/groupKey=<group key hash>.json
or likehttps://api.pushover.net/1/receipts/cancel_by_tag/<group key hash>.json