improve validation process #127

satra · 2022-03-27T17:50:41Z

current json and pydantic validation only raises exceptions. we may also want to consider levels of validation, e.g., ALL, WARN, CRITICAL

ALL: "would be nice if all these metadata were provided"
WARN: "we don't know if this is applicable to you, but it was not provided"
CRITICAL: "missing metadata - dandiset cannot be published without these"

level names are suggestions at this point.

once this is implemented:

the api server would need to adjust how it returns validation info, and that in turn impacts the gui.
the cli would also need to interpret these levels appropriately.

this would also be tied to automigration of any draft dandiset/asset metadata. the validator should be able to indicate to the user if automigration would solve any validation issues or whether human intervention (reupload, reconvert, edit metadata in gui, etc.) is required.

The text was updated successfully, but these errors were encountered:

bendichter · 2022-03-28T13:30:31Z

We have a very similar approach in NWB inspector:

BEST PRACTICE SUGGESTIONS: suggested improvements that would be nice
BEST PRACTICE VIOLATIONS: suggestions that we strongly recommend but are non-blocking
CRITICAL: blocking

Since we are already so close, it might be nice to converge on a system that can be used for both.

bendichter · 2022-03-28T14:10:51Z

thoughts, @CodyCBakerPhD?

CodyCBakerPhD · 2022-03-28T14:24:34Z

Indeed, this is very similar to our approach, which Ben highlights nicely. It's worked well for us so far in providing a global perspective of all the things that could be improved in any given set of NWBFiles, but in order of 'what needs to get fixed ASAP' as opposed to 'what would be nice to have'.

I'd just like to point out two things for our side:

The distribution of number of validations that exist at each level is, as of last count (couple weeks ago) as follows

CRITICAL: 4 checks
BEST PRACTICE VIOLATION: 7 checks
BEST PRACTICE SUGGESTION: 11 checks

And even as we finish setting up the remaining checks on the inspector, this pattern is pretty consistent in that most new ones we add are SUGGESTIONS for metadata or VIOLATIONS against metadata that was specified, just not in the right way.

Many of the things we consider SUGGESTIONS early in our process end up being CRITICAL (blocking) for DANDI; and it's true as well that relative to your proposed

WARN: "we don't know if this is applicable to you, but it was not provided"

depending on the field in question, you might not be 100% sure that a certain field is relevant to that experiment, so having an intermediate 'ignorable' level is just as important as having an outright 'blockable' one.

yarikoptic · 2022-03-28T14:35:44Z

related is validate: RF to collect/output more informative structured records dandi-cli#943 on dandi-cli to provide severity: {'ERROR', 'WARNING', 'HINT'} (or whatever it would be, but consistently named/valued) for each "record". Please contribute to finalize that data structure for validation records.
although I would love to centralize errors/hints in one location -- it is infeasible: it would be infeasible to identify "sub-errors" for each possible problem in violating the schema; we will have errors etc coming from other validators (json/yaml/BIDS/...)
any violation of dandischema should probably be at the level of ERROR (AKA critical)
may be we could annotate some fields requirement to be RECOMMENDED (thus issuing a WARNING/SUGGESTION) but then we would need an additional API to channel recommendations from dandischema beyond ERRORs
We should think thrice about introducing "breaking" compatibility changes to schema -- "breaking" in that dandiset/assets which were previously Ok and published, become ERRORing out.
- We still don't have ways to overlay metadata fixes on top of assets metadata.

danlamanna · 2022-07-13T17:39:33Z

I'm +1'ing this from the archive side. We've run into places where we're unable to distinguish between a user input error and genuine bugs. I'm indifferent to the approach but a common pattern I've seen is creating a base exception class e.g. DandiSchemaException that would account for any known problems on the dandi-schema side.

waxlamp mentioned this issue Jul 13, 2022

getting timeouts, 500s, and good number of errors logged -- need help with analysis/triage dandi/dandi-archive#1117

Closed

danlamanna mentioned this issue Feb 13, 2023

Skip computing assets summary for invalid assets dandi/dandi-archive#1489

Merged

satra mentioned this issue Nov 22, 2023

refactoring the schema to provide always "valid" models #204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve validation process #127

improve validation process #127

satra commented Mar 27, 2022

bendichter commented Mar 28, 2022

bendichter commented Mar 28, 2022

CodyCBakerPhD commented Mar 28, 2022

yarikoptic commented Mar 28, 2022

danlamanna commented Jul 13, 2022

improve validation process #127

improve validation process #127

Comments

satra commented Mar 27, 2022

bendichter commented Mar 28, 2022

bendichter commented Mar 28, 2022

CodyCBakerPhD commented Mar 28, 2022

yarikoptic commented Mar 28, 2022

danlamanna commented Jul 13, 2022