Operator Level 4 Capability Level - Deep Insights: Monitoring and Alerting #205

rm3l · 2024-02-16T15:33:03Z

See https://sdk.operatorframework.io/docs/overview/operator-capabilities/#level-4---deep-insights

Goal
Setup full monitoring and alerting for your operand. All resources such as Prometheus rules (alerts) and Grafana dashboards should be created by the operator when the operand CR is instantiated.

TODO

Add ability in CRD to create a ServiceMonitor resource (follow-up to Integration with OpenShift logging and Monitoring #180)
Implement Prometheus metrics in RHDH Operator for Backstage CR reconciliation failure/success
Implement Grafana dashboards to monitor a) Whether the operator is up and running as well as how long it has been running, b) memory & CPU consumption by the operator
Implement alerts so that when the operator is down, certain actions get triggered, eg, a notification gets sent to the user's slack channel, a Jira ticket is created, etc.

gazarenkov · 2024-02-19T12:36:41Z

Do we also consider upstream (vanilla K8s) here or Openshift only?

rm3l · 2024-02-26T22:12:18Z

Do we also consider upstream (vanilla K8s) here or Openshift only?

Not only OpenShift, I think. It should work on both.

github-actions bot added the jira Issue will be sync'ed to Red Hat JIRA label Feb 16, 2024

rm3l changed the title ~~Add ability in CRD to create a ServiceMonitor resource.~~ [Epic] Monitoring Feb 16, 2024

jianrongzhang89 changed the title ~~[Epic] Monitoring~~ [Epic] RHDH Operator Monitoring Feb 16, 2024

rm3l changed the title ~~[Epic] RHDH Operator Monitoring~~ RHDH Operator Monitoring Feb 26, 2024

rm3l changed the title ~~RHDH Operator Monitoring~~ Operator Level 4 Capability Level - Deep Insights: Monitoring and Alerting Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator Level 4 Capability Level - Deep Insights: Monitoring and Alerting #205

Operator Level 4 Capability Level - Deep Insights: Monitoring and Alerting #205

rm3l commented Feb 16, 2024 •

edited

Loading

gazarenkov commented Feb 19, 2024

rm3l commented Feb 26, 2024

Operator Level 4 Capability Level - Deep Insights: Monitoring and Alerting #205

Operator Level 4 Capability Level - Deep Insights: Monitoring and Alerting #205

Comments

rm3l commented Feb 16, 2024 • edited Loading

gazarenkov commented Feb 19, 2024

rm3l commented Feb 26, 2024

rm3l commented Feb 16, 2024 •

edited

Loading