Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup MCOA dashboards and scrape configs #1726

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

thibaultmg
Copy link
Contributor

@thibaultmg thibaultmg commented Dec 16, 2024

This PR refactors the Grafana dashboards organisation to improve its maintenance and adds configuration for MCOA metrics collection. More precisely:

  • It encapsulates coherent groups of dashboards in a common directory.
  • Each directory contains the list of metrics to be collected for making them work as well as the prometheus rules to be deployed on the spokes.
  • Some CI scripts to check alignment between the metrics needed for the dashboards and the collected ones.
  • For MCOA, the whole list of collected metrics has been reviewed and optimised. This required changing some queries in dashboards. When this is the case, I have duplicated the whole directory for MCOA.
  • Specific resources for MCOA (scrapeConfigs and prometheus rules) are only deployed when MCOA is activated.
  • Dashboards that are not supported by MCOA have their title being suffixed with the string "DEPRECATED" when MCOA is active. MCOA specific dashboards are always deployed with the directories being suffixed with the string "MCOA".

Copy link

openshift-ci bot commented Dec 16, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Dec 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thibaultmg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thibaultmg thibaultmg force-pushed the nexus_dashboards_init branch from b84bfbc to 594f676 Compare January 10, 2025 09:06
@thibaultmg thibaultmg force-pushed the nexus_dashboards_init branch from 623af54 to 4117360 Compare January 10, 2025 10:03
@thibaultmg thibaultmg marked this pull request as ready for review January 10, 2025 10:06
@thibaultmg thibaultmg force-pushed the nexus_dashboards_init branch 2 times, most recently from 7676417 to ffed470 Compare January 14, 2025 16:01
@thibaultmg thibaultmg changed the title WIP: Nexus dashboards Nexus dashboards Jan 14, 2025
@thibaultmg thibaultmg changed the title Nexus dashboards Setup MCOA dashboards and scrape configs Jan 14, 2025
@thibaultmg
Copy link
Contributor Author

/retest

@thibaultmg thibaultmg force-pushed the nexus_dashboards_init branch from ffed470 to bafbff8 Compare January 15, 2025 12:28
@berenss
Copy link

berenss commented Jan 15, 2025

OCP 3.11 dashboards can be deprecated and removed. The documentation indicated deprecate with ACM 2.9, over 3 releases ago.

@jacobbaungard
Copy link
Contributor

This looks pretty good to me.

Each directory contains the list of metrics to be collected for making them work as well as the prometheus rules to be deployed on the spokes.

Where do I find this metrics list? I don't seem to see it, but might be I've just missed it somehow.

Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
@thibaultmg thibaultmg force-pushed the nexus_dashboards_init branch from 2efc330 to 7e47a3d Compare January 24, 2025 14:19
@thibaultmg
Copy link
Contributor Author

thibaultmg commented Jan 24, 2025

Where do I find this metrics list? I don't seem to see it, but might be I've just missed it somehow.

I mean that each directory encapsulates: the dashboards, the scrapeConfigs (containing the metrics list) and the prometheusRules. So this is encapsulated for MCOA. But for the current setup, nothing changes. All metrics to be collected are defined in a single allowList configmap.
I am setting up a demo env for the review.

Signed-off-by: Thibault Mange <[email protected]>
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
27.6% Coverage on New Code (required ≥ 70%)
C Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link

openshift-ci bot commented Jan 24, 2025

@thibaultmg: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/sonarcloud 7c5d62e link false /test sonarcloud
ci/prow/test-e2e 7c5d62e link true /test test-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants