Scheduler status visibility #478

sihil · 2018-01-18T14:58:42Z

A scheduled job (see #476) can fail to kick off for a few reasons and it should be really obvious through the interface when this has happened. Similarly, if a scheduled job fails someone probably wants to hear about it.

We might:

Add a status dashboard that shows all failed scheduled jobs
Add a topic or other notification mechanism for letting people know about failed jobs.

alexduf · 2018-01-22T10:28:30Z

Could adding an email address field when scheduling the deploy be a simple way to get feedback in case of an issue?

sihil · 2018-02-09T19:24:57Z

@nicl Now that we've had some use of this shall we pick it up again on Monday to figure out what we need?

nicl · 2018-02-12T12:21:19Z

Talking with @sihil @adamnfish has also suggested sending emails on any failed deploy which would also solve this issue (in a basic sense).

sihil · 2018-02-12T13:58:05Z

In order to implement this we need some way of getting an e-mail address to notify. I suggest that we use Prism Owners for this (https://github.com/guardian/prism/blob/master/app/data/Owners.scala).

To use prism owners we'd need to gather the set of SSAs being deployed, look them up in Prism and then actually fire off the e-mails. This means having a place in the code where we can detect failure where we have access to the set of SSAs (or data that allows us to derive this). I suspect that this place is the DeployGroupRunner that has the DeployContext (which contains the parameters and the task graph) and also sees the failure events. Having said that the task graph has lost easy access to the app and stack data so this will likely be non-trivial.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler status visibility #478

Scheduler status visibility #478

sihil commented Jan 18, 2018

alexduf commented Jan 22, 2018

sihil commented Feb 9, 2018

nicl commented Feb 12, 2018

sihil commented Feb 12, 2018

Scheduler status visibility #478

Scheduler status visibility #478

Comments

sihil commented Jan 18, 2018

alexduf commented Jan 22, 2018

sihil commented Feb 9, 2018

nicl commented Feb 12, 2018

sihil commented Feb 12, 2018