Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /rules/raw row to Observatorium API dashboard + refactor #337

Merged
merged 14 commits into from
Oct 7, 2022

Conversation

jessicalins
Copy link
Contributor

@jessicalins jessicalins commented Sep 29, 2022

This PR refactors observatorium-api.libsonnet to reuse common parts and ideally makes it easier to add new rows/panels to the Observatorium API grafana dashboard.

I've added also a new row for /rules/raw - so it makes easier to debug once alerts fire for our Rules path (for example when following this runbook)

Testing dashboard here
EDIT: Updated testing dashboard with error codes here

Availability is low for /rules/raw read since we don't have traffic reading this endpoint. I think next step would be re adapt obsctl-reloader to constant 'read' this endpoint and generate constant traffic there

Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
@douglascamata
Copy link
Contributor

From looking at the test dashboard, in the errors chart we're using the caption "Errors" for all the series returned by the query sum(rate(http_requests_total{job="observatorium-observatorium-api",handler="rules-raw",code=~"5.."}[5m])) / sum(rate(http_requests_total{job="observatorium-thanos-query",handler="rules-raw"}[5m])).

I think would be very useful to include the code label in the caption to help out with debugging. What do you think?

This is a feedback that I have regarding many of our dashboards and I plan to start similar discussions/changes everywhere that it's applicable.

@jessicalins
Copy link
Contributor Author

@douglascamata that's a good idea! let me try adding this in this PR for this dashboard. In case this applies to other dashboards, maybe we could add a ticket/issue so that this is not forgotten?

@douglascamata
Copy link
Contributor

@jessicalins when I get some time I will figure out where this hardcoded "Error" label come from in the other dashboards (http and grpc errors too) and create an issue(s) in the repo(s) + a ticket on our side. 👍

Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Jéssica Lins <[email protected]>
@douglascamata
Copy link
Contributor

FYI I created an issue for the other dashboards widgets: #348

Copy link
Contributor

@douglascamata douglascamata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the query isn't returning exactly what we expect now.

Signed-off-by: Jéssica Lins <[email protected]>
Copy link
Contributor

@douglascamata douglascamata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@jessicalins jessicalins merged commit 99defe9 into rhobs:main Oct 7, 2022
@jessicalins jessicalins deleted the add-rules-raw-panel branch October 7, 2022 13:11
philipgough pushed a commit to philipgough/configuration that referenced this pull request Oct 10, 2022
* Refactor titleRow
* Refactor query row
* Refactor query_range row
* Start refactor RED for all row
* Refactor all query err
* Refactor query duration
* Refactor titleRow
* Finish refactor
* Start to add /rules/raw row
* Fix positioning, finish rules row
* Remove unused var
* sum by code for errors panel
* Unify aliasColors
* Fix errors query, use scalar

Signed-off-by: Jéssica Lins <[email protected]>
philipgough added a commit that referenced this pull request Oct 10, 2022
* Add Rules and Alertmanager SLOs (#298)

* Add Rules & alerting SLOs panels
* Add Telemeter SLOs
* Remove unnecessary comment
* Use rate for SLO query about alerts being delivered to upstream targets
* Update rules sync SLO query

* Add alerts for Rules and Alerting SLOs (#300)

* Add alerts for Rules and Alerting SLOs
* Add container selector to APIRulesSyncAvailabilityErrorBudgetBurning alert
* Fix telemeter namespace, update sync SLO query in alerts
* Refactor instanceNamespace function

Signed-off-by: Jéssica Lins <[email protected]>

* SLOs: Prune unsupported labels (#325)

SLOs: Prune unsupported labels (#325)

Signed-off-by: Jéssica Lins <[email protected]>

* [observatorium-logs] Increase the querier timeout from 3m to 6m (#330)

* Fix SLO alerting for metrics (#328)

* Fix SLO alerting for metrics

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add back code labels to exclude 4xx

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add comment about fork

Signed-off-by: Saswata Mukherjee <[email protected]>

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add OSD to rules-obsctl-reloader (#329)

Signed-off-by: Saswata Mukherjee <[email protected]>

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add Loki ruler and static rules for tenant OCM (#331)

* Fix loki ruler memory requests (#332)

* Fix ocm panic logs-based alert (#333)

* Remove recycle annotations for loki rules (#334)

* Add staging test alerts for rhobs logs (#335)

* Fix alertmanager discovery for logs ruler (#338)

* Use alertmanager v1 api for Loki ruler (#339)

* Update Telemeter rules (#340)

Signed-off-by: Douglas Camata <[email protected]>

* Fix ARM64 (M1 Pro) support (#344)

* Update Bingo

Signed-off-by: Douglas Camata <[email protected]>

* Update Bingo deps for aarm64 support

Signed-off-by: Douglas Camata <[email protected]>

Signed-off-by: Douglas Camata <[email protected]>

* Add suppport for Loki ruler to manage rules on object storage (#345)

* Add template for Loki ruler CRDs (#349)

* Add /rules/raw row to Observatorium API dashboard + refactor (#337)

* Refactor titleRow
* Refactor query row
* Refactor query_range row
* Start refactor RED for all row
* Refactor all query err
* Refactor query duration
* Refactor titleRow
* Finish refactor
* Start to add /rules/raw row
* Fix positioning, finish rules row
* Remove unused var
* sum by code for errors panel
* Unify aliasColors
* Fix errors query, use scalar

Signed-off-by: Jéssica Lins <[email protected]>

* Fix Updating dashboards README section (#351)

Signed-off-by: Jéssica Lins <[email protected]>

* Add obsctl-reloader support for Loki alerting- and recordingrules (#352)

* Add cluster-role observatorium-logs-edit for dedicated-admins (#353)

* Add dedicated-admin label for obs-logs-edit clusterrole (#354)

* Enable hashing algorithm for receive to be set via parameter

Note, this is a breakinbg change which requires the use
of Thanos >= v0.28.0

* Ketama sync to main (#347)

* Update README (#343)

* Update README

- Added instructions for macOS
- Fixed jsonnet deps command

Signed-off-by: Douglas Camata <[email protected]>

* Update OpenShift templates doc link

Signed-off-by: Douglas Camata <[email protected]>

Signed-off-by: Douglas Camata <[email protected]>

* Disable compression on receive (#346)

Signed-off-by: Matej Gera <[email protected]>

Signed-off-by: Matej Gera <[email protected]>

Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Matej Gera <[email protected]>
Co-authored-by: Douglas Camata <[email protected]>

Signed-off-by: Jéssica Lins <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Matej Gera <[email protected]>
Co-authored-by: Jéssica Lins <[email protected]>
Co-authored-by: Periklis Tsirakidis <[email protected]>
Co-authored-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Douglas Camata <[email protected]>
Co-authored-by: Matej Gera <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants