-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Centralized errors / workflow runs reporting #505
Comments
To clarify, in your mind, this centralized list of failures consolidates all errors across all workflows in a workspace correct? I think this makes sense |
Haven't seen option 3 in SOAR UI. But it's the default UI/UX for data workflow orchestrators like prefect and airflow. And it makes...a lot of sense. Especially cause Tracecat encourages smaller composable workflows. A list of all workflow runs is the most elegant option. Also gives nice filtering / search capabilities. |
Yes, as an example, since we moved our production instance from somewhere around 0.9 to 0.13, some of our workflows were broken due to namespaces renamed or configuration for udfs changing. There was no way to know this unless proactively unless each workflow was inspected to look at the recent runs. Highlighted a bigger issue such as a secret expiring and causing an integration to stop working and not knowing. |
I think a big piece is going to be whether we can also have the system notify on recurring errors with a workflow. I think we discussed something like this at one point, being able to see the "health" of a workflow. Notifying on the health of a workflow declining would be helpful. |
The namespaces issue should have been caught on our end. Moving forward we will manage migrations for integration namespaces directly in Alembic. Another point: we're revamping the workflow table as well to show: 1. last run date, 2. last run failure date, and general status flag |
Is your feature request related to a problem? Please describe.
Unhandled exceptions are not easily reported on currently, if you have a workflow that has errored out in a way that is unexpected, unless you add error handling to the workflow itself, there is no centralized place to report or alert on errors.
Describe the solution you'd like
A few ideas:
If a workflow ends because of an exception, have an alerting configuration/workflow that runs. Idea here is that there is a system-defined "error" workflow that we could use to do whatever we need to with an error.
Logging and a report of same errors that can be viewed or scheduled.
Single "runs" view that shows all workflows, not just the currently open one. View can be filtered on whether the run was successful or not.
Describe alternatives you've considered
The only way currently to tell if an error occurred is to open each workflow and look at the previous runs or inspect logs on the worker container for errors.
Additional context
The text was updated successfully, but these errors were encountered: