Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule for validating dashboards #6

Closed
wants to merge 3 commits into from
Closed

Conversation

sdboyer
Copy link

@sdboyer sdboyer commented Nov 22, 2021

This adds rules for validating the dashboards using the official upstream schema.

I've marked this as a draft because, after having seen how the flow of this tool actually works, it seems like a sloppy way to integrate this kind of logic - the possibility of false positives, and the messiness of errors, make it maaaaaaybe a good addition (it's good enough that we no longer allow Grafana dashboards to enter grafana/grafana/devenv without passing validation), but still not completely canonical and reliable. OTOH, that isn't worse than e.g. false positives arising from expecting only Prom queries.

Either way, i figured it was worth putting the PR up at minimum to show how this kind of addition could be made. My guess is that a larger refactor of the application to treat validation as a prerequisite to linting would be a better architecture, and would also probably set us up for when we have Go structs that represent dashboard structures.

@sdboyer sdboyer requested review from rgeyer and tomwilkie November 22, 2021 20:40
var basesch, distsch schema.CueSchema

func getBaseSchema() (schema.CueSchema, error) {
var err error
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bleh, this means only the first call will actually error correctly, defeating the point of a Once.

If we actually want to merge this, i'll fix the problem.


"cuelang.org/go/cue/errors"
"github.com/grafana/grafana/pkg/schema"
"github.com/grafana/grafana/pkg/schema/load"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these packages apache licensed? This tool can't import AGPL code (its imported by mixtool, which is apache licensed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this concern. Seems like the schema bits of grafana could/should/will-eventually be under a more forgiving license?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looks like we need to update the list. All the CUE-related schema stuff is supposed to be Apache 2 (which is why https://github.com/grafana/grafana/blob/main/LICENSING.md excludes cue), but looks like i didn't add a line for this subtree in particular when this package was added. Will make a PR, for this and a couple others.

(pkg/schema will become grafana/scuemata, which is A2)

basesch, err = load.BaseDashboardFamily(load.GetDefaultLoadPaths())
if err != nil {
panic(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we're panicing and returning an error here; can we do one or the other?

Copy link
Author

@sdboyer sdboyer Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sloppy lack of removal of my original approach to this, lol

@tomwilkie
Copy link
Contributor

Thanks Sam! Have you run this against the integrations for cloud? I wonder how many of the dashboards fail...

@rgeyer
Copy link
Collaborator

rgeyer commented Nov 22, 2021

This is great! Have been noodling on how to include validation into this.

A couple of things stand out as possible improvements to the tool which might make rules like this more palatable.

  1. I went back and forth on whether a rule should return only one result, or could return an array of results. I presume for validation, it would be useful to return several error results, one for each validation failure?
  2. It probably makes sense to introduce the idea of an "experimental" or optional rule. Such that it can be included in the tool but only executed optionally. This would be the converse of the .lint exclusion rules. The user could opt-in to use these experimental rules, rather than opt-out a given rule for every dashboard.

@sdboyer
Copy link
Author

sdboyer commented Nov 23, 2021

Have you run this against the integrations for cloud?

Haven't, just banged this out and put up a PR after trying on some basic JSON

I presume for validation, it would be useful to return several error results, one for each validation failure?

Frankly, right now, the output can be just vomit - see the link in the OP. It seems likely we're going to need our whole own framework for sane error management; not sure how much or when we can reasonably expect upstream CUE to do. (Apart from cue-lang/cue#602, which i linked to on the other issue). This is one of many areas i would love to be able to invest in :)

So, multiple error results? Yes, that would be helpful, as once it's not vomit, there will still plausibly be multiple. But...

It probably makes sense to introduce the idea of an "experimental" or optional rule.

That's one way to do it. Honestly, validation is a different class of check than linting, fully prerequisite to it. That is, it's usually not worth linting invalid stuff. That won't be true here until the schema are fully developed, of course, but it does suggest that it might be worth making validation into a special, non-extensible operation that - if turned on - is run prior to all other rules.

Also, just fyi - this exact same logic is available in grafana-cli cue validate-resource

@sdboyer
Copy link
Author

sdboyer commented Nov 23, 2021

The direction i imagine this could head, btw, is something like what i described here: https://www.youtube.com/watch?v=PpoS_ThntEM&t=1709s

So, you start with SearchAndValidate, and then decide based on the Grafana dashboard version that rules are written against, and the lacunae emitted on migrating, whether to run a lint rule or not, or if it's just all too risky and bail out

@tomwilkie
Copy link
Contributor

I went back and forth on whether a rule should return only one result, or could return an array of results. I presume for validation, it would be useful to return several error results, one for each validation failure?

In mixtool we used an error channel for the lint errors, allowing rules to "return" many: https://github.com/monitoring-mixins/mixtool/blob/master/pkg/mixer/lint.go#L67

Not 100% sure I want to recommend this approach, this is more just an FYI.

It probably makes sense to introduce the idea of an "experimental" or optional rule. Such that it can be included in the tool but only executed optionally. This would be the converse of the .lint exclusion rules. The user could opt-in to use these experimental rules, rather than opt-out a given rule for every dashboard.

Personally I think its fine for this tool to be super opinionated. I considered even arguing that lint rules shouldn't be optional ala golint https://github.com/golang/lint#purpose. Using this linter is completely optional, and part of the reason for is existence IMO is consistency: any dashboard that passes this linter should be constructed and act the same. WDYT?

@sdboyer
Copy link
Author

sdboyer commented Nov 24, 2021

Licensing fix: grafana/grafana#42234

@rgeyer
Copy link
Collaborator

rgeyer commented Nov 29, 2021

It probably makes sense to introduce the idea of an "experimental" or optional rule. Such that it can be included in the tool but only executed optionally. This would be the converse of the .lint exclusion rules. The user could opt-in to use these experimental rules, rather than opt-out a given rule for every dashboard.

Personally I think its fine for this tool to be super opinionated. I considered even arguing that lint rules shouldn't be optional ala golint https://github.com/golang/lint#purpose. Using this linter is completely optional, and part of the reason for is existence IMO is consistency: any dashboard that passes this linter should be constructed and act the same. WDYT?

I agree that this tool should be super opinionated. My concern is in defending the rules and their outcome.

For instance, when offering a PR to an upstream mixin or dashboard, and citing that it doesn't pass a given linting rule, I don't want to leave margin for a maintainer to look at required, opinionated rules in the linter and find one that we can't unapologetically defend.

At the same time, I think it's useful for there to be rules which we expect may not pass in all cases, or which we're experimenting with that may not be mature.

For this functionality in particular, I agree with @sdboyer that we ought not merge validation as a rule. Instead it should be a pre-req that would be run beforehand in a CI toolchain.

I expect at some point we'll encounter another type of rule that might fit in this "gray area" tho. 🤔

@sdboyer
Copy link
Author

sdboyer commented Dec 15, 2021

One major thought, here: opinionated isn't a problem, but exclusionary is. From my quick tests, this seems to just assume everything is a Prom datasource. That makes it impossible to use this for anything other than Prom, and by extension, to recommend it to users who are using anything other than Prom.

IMO, that goes beyond opinionated. Now, that still wouldn't be a problem, except for the namespace this lives in - grafana/dashboard-linter. That's a fully general name. Scope should be comparably general.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sdboyer
Copy link
Author

sdboyer commented Jun 15, 2022

i'm gonna close this one...but FWIW, trying to do the integration here led to many, many improvements. I'll be coming with another PR based on grok sooner rather than later :)

@sdboyer sdboyer closed this Jun 15, 2022
@rgeyer
Copy link
Collaborator

rgeyer commented Jun 15, 2022

One major thought, here: opinionated isn't a problem, but exclusionary is. From my quick tests, this seems to just assume everything is a Prom datasource. That makes it impossible to use this for anything other than Prom, and by extension, to recommend it to users who are using anything other than Prom.

IMO, that goes beyond opinionated. Now, that still wouldn't be a problem, except for the namespace this lives in - grafana/dashboard-linter. That's a fully general name. Scope should be comparably general.

FYI, this is finally getting addressed in the next couple weeks, and is captured in an issue here #61

I'll likely add you as a reviewer when I start making these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants