Add rule for validating dashboards #6

sdboyer · 2021-11-22T20:40:05Z

This adds rules for validating the dashboards using the official upstream schema.

I've marked this as a draft because, after having seen how the flow of this tool actually works, it seems like a sloppy way to integrate this kind of logic - the possibility of false positives, and the messiness of errors, make it maaaaaaybe a good addition (it's good enough that we no longer allow Grafana dashboards to enter grafana/grafana/devenv without passing validation), but still not completely canonical and reliable. OTOH, that isn't worse than e.g. false positives arising from expecting only Prom queries.

Either way, i figured it was worth putting the PR up at minimum to show how this kind of addition could be made. My guess is that a larger refactor of the application to treat validation as a prerequisite to linting would be a better architecture, and would also probably set us up for when we have Go structs that represent dashboard structures.

sdboyer · 2021-11-22T20:43:06Z

lint/rule_validate.go

+var basesch, distsch schema.CueSchema
+
+func getBaseSchema() (schema.CueSchema, error) {
+	var err error


Bleh, this means only the first call will actually error correctly, defeating the point of a Once.

If we actually want to merge this, i'll fix the problem.

tomwilkie · 2021-11-22T20:46:34Z

lint/rule_validate.go

+
+	"cuelang.org/go/cue/errors"
+	"github.com/grafana/grafana/pkg/schema"
+	"github.com/grafana/grafana/pkg/schema/load"


Are these packages apache licensed? This tool can't import AGPL code (its imported by mixtool, which is apache licensed).

+1 this concern. Seems like the schema bits of grafana could/should/will-eventually be under a more forgiving license?

Hmm, looks like we need to update the list. All the CUE-related schema stuff is supposed to be Apache 2 (which is why https://github.com/grafana/grafana/blob/main/LICENSING.md excludes cue), but looks like i didn't add a line for this subtree in particular when this package was added. Will make a PR, for this and a couple others.

(pkg/schema will become grafana/scuemata, which is A2)

tomwilkie · 2021-11-22T20:47:26Z

lint/rule_validate.go

+		basesch, err = load.BaseDashboardFamily(load.GetDefaultLoadPaths())
+		if err != nil {
+			panic(err)
+		}


I don't understand why we're panicing and returning an error here; can we do one or the other?

sloppy lack of removal of my original approach to this, lol

tomwilkie · 2021-11-22T20:48:12Z

Thanks Sam! Have you run this against the integrations for cloud? I wonder how many of the dashboards fail...

rgeyer · 2021-11-22T23:12:15Z

This is great! Have been noodling on how to include validation into this.

A couple of things stand out as possible improvements to the tool which might make rules like this more palatable.

I went back and forth on whether a rule should return only one result, or could return an array of results. I presume for validation, it would be useful to return several error results, one for each validation failure?
It probably makes sense to introduce the idea of an "experimental" or optional rule. Such that it can be included in the tool but only executed optionally. This would be the converse of the .lint exclusion rules. The user could opt-in to use these experimental rules, rather than opt-out a given rule for every dashboard.

sdboyer · 2021-11-23T00:02:02Z

Have you run this against the integrations for cloud?

Haven't, just banged this out and put up a PR after trying on some basic JSON

I presume for validation, it would be useful to return several error results, one for each validation failure?

Frankly, right now, the output can be just vomit - see the link in the OP. It seems likely we're going to need our whole own framework for sane error management; not sure how much or when we can reasonably expect upstream CUE to do. (Apart from cue-lang/cue#602, which i linked to on the other issue). This is one of many areas i would love to be able to invest in :)

So, multiple error results? Yes, that would be helpful, as once it's not vomit, there will still plausibly be multiple. But...

It probably makes sense to introduce the idea of an "experimental" or optional rule.

That's one way to do it. Honestly, validation is a different class of check than linting, fully prerequisite to it. That is, it's usually not worth linting invalid stuff. That won't be true here until the schema are fully developed, of course, but it does suggest that it might be worth making validation into a special, non-extensible operation that - if turned on - is run prior to all other rules.

Also, just fyi - this exact same logic is available in grafana-cli cue validate-resource

sdboyer · 2021-11-23T00:04:30Z

The direction i imagine this could head, btw, is something like what i described here: https://www.youtube.com/watch?v=PpoS_ThntEM&t=1709s

So, you start with SearchAndValidate, and then decide based on the Grafana dashboard version that rules are written against, and the lacunae emitted on migrating, whether to run a lint rule or not, or if it's just all too risky and bail out

tomwilkie · 2021-11-23T12:14:27Z

I went back and forth on whether a rule should return only one result, or could return an array of results. I presume for validation, it would be useful to return several error results, one for each validation failure?

In mixtool we used an error channel for the lint errors, allowing rules to "return" many: https://github.com/monitoring-mixins/mixtool/blob/master/pkg/mixer/lint.go#L67

Not 100% sure I want to recommend this approach, this is more just an FYI.

It probably makes sense to introduce the idea of an "experimental" or optional rule. Such that it can be included in the tool but only executed optionally. This would be the converse of the .lint exclusion rules. The user could opt-in to use these experimental rules, rather than opt-out a given rule for every dashboard.

Personally I think its fine for this tool to be super opinionated. I considered even arguing that lint rules shouldn't be optional ala golint https://github.com/golang/lint#purpose. Using this linter is completely optional, and part of the reason for is existence IMO is consistency: any dashboard that passes this linter should be constructed and act the same. WDYT?

sdboyer · 2021-11-24T14:16:50Z

Licensing fix: grafana/grafana#42234

rgeyer · 2021-11-29T20:16:18Z

It probably makes sense to introduce the idea of an "experimental" or optional rule. Such that it can be included in the tool but only executed optionally. This would be the converse of the .lint exclusion rules. The user could opt-in to use these experimental rules, rather than opt-out a given rule for every dashboard.

Personally I think its fine for this tool to be super opinionated. I considered even arguing that lint rules shouldn't be optional ala golint https://github.com/golang/lint#purpose. Using this linter is completely optional, and part of the reason for is existence IMO is consistency: any dashboard that passes this linter should be constructed and act the same. WDYT?

I agree that this tool should be super opinionated. My concern is in defending the rules and their outcome.

For instance, when offering a PR to an upstream mixin or dashboard, and citing that it doesn't pass a given linting rule, I don't want to leave margin for a maintainer to look at required, opinionated rules in the linter and find one that we can't unapologetically defend.

At the same time, I think it's useful for there to be rules which we expect may not pass in all cases, or which we're experimenting with that may not be mature.

For this functionality in particular, I agree with @sdboyer that we ought not merge validation as a rule. Instead it should be a pre-req that would be run beforehand in a CI toolchain.

I expect at some point we'll encounter another type of rule that might fit in this "gray area" tho. 🤔

sdboyer · 2021-12-15T03:42:59Z

One major thought, here: opinionated isn't a problem, but exclusionary is. From my quick tests, this seems to just assume everything is a Prom datasource. That makes it impossible to use this for anything other than Prom, and by extension, to recommend it to users who are using anything other than Prom.

IMO, that goes beyond opinionated. Now, that still wouldn't be a problem, except for the namespace this lives in - grafana/dashboard-linter. That's a fully general name. Scope should be comparably general.

CLAassistant · 2022-06-15T18:02:52Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

sdboyer · 2022-06-15T18:14:48Z

i'm gonna close this one...but FWIW, trying to do the integration here led to many, many improvements. I'll be coming with another PR based on grok sooner rather than later :)

rgeyer · 2022-06-15T19:44:12Z

One major thought, here: opinionated isn't a problem, but exclusionary is. From my quick tests, this seems to just assume everything is a Prom datasource. That makes it impossible to use this for anything other than Prom, and by extension, to recommend it to users who are using anything other than Prom.

IMO, that goes beyond opinionated. Now, that still wouldn't be a problem, except for the namespace this lives in - grafana/dashboard-linter. That's a fully general name. Scope should be comparably general.

FYI, this is finally getting addressed in the next couple weeks, and is captured in an issue here #61

I'll likely add you as a reviewer when I start making these changes.

Add rule for validation dashboards

041ab28

sdboyer requested review from rgeyer and tomwilkie November 22, 2021 20:40

Actually use the Onces

6c93145

sdboyer commented Nov 22, 2021

View reviewed changes

tomwilkie reviewed Nov 22, 2021

View reviewed changes

De-slop error panics

d1caaaf

sdboyer closed this Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rule for validating dashboards #6

Add rule for validating dashboards #6

sdboyer commented Nov 22, 2021 •

edited

Loading

sdboyer Nov 22, 2021

tomwilkie Nov 22, 2021

rgeyer Nov 22, 2021

sdboyer Nov 22, 2021

tomwilkie Nov 22, 2021

sdboyer Nov 22, 2021 •

edited

Loading

tomwilkie commented Nov 22, 2021

rgeyer commented Nov 22, 2021

sdboyer commented Nov 23, 2021

sdboyer commented Nov 23, 2021 •

edited

Loading

tomwilkie commented Nov 23, 2021

sdboyer commented Nov 24, 2021

rgeyer commented Nov 29, 2021

sdboyer commented Dec 15, 2021

CLAassistant commented Jun 15, 2022

sdboyer commented Jun 15, 2022

rgeyer commented Jun 15, 2022

Add rule for validating dashboards #6

Add rule for validating dashboards #6

Conversation

sdboyer commented Nov 22, 2021 • edited Loading

sdboyer Nov 22, 2021

Choose a reason for hiding this comment

tomwilkie Nov 22, 2021

Choose a reason for hiding this comment

rgeyer Nov 22, 2021

Choose a reason for hiding this comment

sdboyer Nov 22, 2021

Choose a reason for hiding this comment

tomwilkie Nov 22, 2021

Choose a reason for hiding this comment

sdboyer Nov 22, 2021 • edited Loading

Choose a reason for hiding this comment

tomwilkie commented Nov 22, 2021

rgeyer commented Nov 22, 2021

sdboyer commented Nov 23, 2021

sdboyer commented Nov 23, 2021 • edited Loading

tomwilkie commented Nov 23, 2021

sdboyer commented Nov 24, 2021

rgeyer commented Nov 29, 2021

sdboyer commented Dec 15, 2021

CLAassistant commented Jun 15, 2022

sdboyer commented Jun 15, 2022

rgeyer commented Jun 15, 2022

sdboyer commented Nov 22, 2021 •

edited

Loading

sdboyer Nov 22, 2021 •

edited

Loading

sdboyer commented Nov 23, 2021 •

edited

Loading