Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve validation process #127

Open
satra opened this issue Mar 27, 2022 · 5 comments
Open

improve validation process #127

satra opened this issue Mar 27, 2022 · 5 comments

Comments

@satra
Copy link
Member

satra commented Mar 27, 2022

current json and pydantic validation only raises exceptions. we may also want to consider levels of validation, e.g., ALL, WARN, CRITICAL

ALL: "would be nice if all these metadata were provided"
WARN: "we don't know if this is applicable to you, but it was not provided"
CRITICAL: "missing metadata - dandiset cannot be published without these"

level names are suggestions at this point.

once this is implemented:

  1. the api server would need to adjust how it returns validation info, and that in turn impacts the gui.
  2. the cli would also need to interpret these levels appropriately.

this would also be tied to automigration of any draft dandiset/asset metadata. the validator should be able to indicate to the user if automigration would solve any validation issues or whether human intervention (reupload, reconvert, edit metadata in gui, etc.) is required.

@bendichter
Copy link
Member

We have a very similar approach in NWB inspector:

BEST PRACTICE SUGGESTIONS: suggested improvements that would be nice
BEST PRACTICE VIOLATIONS: suggestions that we strongly recommend but are non-blocking
CRITICAL: blocking

Since we are already so close, it might be nice to converge on a system that can be used for both.

@bendichter
Copy link
Member

thoughts, @CodyCBakerPhD?

@CodyCBakerPhD
Copy link

Indeed, this is very similar to our approach, which Ben highlights nicely. It's worked well for us so far in providing a global perspective of all the things that could be improved in any given set of NWBFiles, but in order of 'what needs to get fixed ASAP' as opposed to 'what would be nice to have'.

I'd just like to point out two things for our side:

  1. The distribution of number of validations that exist at each level is, as of last count (couple weeks ago) as follows

CRITICAL: 4 checks
BEST PRACTICE VIOLATION: 7 checks
BEST PRACTICE SUGGESTION: 11 checks

And even as we finish setting up the remaining checks on the inspector, this pattern is pretty consistent in that most new ones we add are SUGGESTIONS for metadata or VIOLATIONS against metadata that was specified, just not in the right way.

  1. Many of the things we consider SUGGESTIONS early in our process end up being CRITICAL (blocking) for DANDI; and it's true as well that relative to your proposed

WARN: "we don't know if this is applicable to you, but it was not provided"

depending on the field in question, you might not be 100% sure that a certain field is relevant to that experiment, so having an intermediate 'ignorable' level is just as important as having an outright 'blockable' one.

@yarikoptic
Copy link
Member

  • related is validate: RF to collect/output more informative structured records dandi-cli#943 on dandi-cli to provide severity: {'ERROR', 'WARNING', 'HINT'} (or whatever it would be, but consistently named/valued) for each "record". Please contribute to finalize that data structure for validation records.
  • although I would love to centralize errors/hints in one location -- it is infeasible: it would be infeasible to identify "sub-errors" for each possible problem in violating the schema; we will have errors etc coming from other validators (json/yaml/BIDS/...)
  • any violation of dandischema should probably be at the level of ERROR (AKA critical)
  • may be we could annotate some fields requirement to be RECOMMENDED (thus issuing a WARNING/SUGGESTION) but then we would need an additional API to channel recommendations from dandischema beyond ERRORs
  • We should think thrice about introducing "breaking" compatibility changes to schema -- "breaking" in that dandiset/assets which were previously Ok and published, become ERRORing out.
    • We still don't have ways to overlay metadata fixes on top of assets metadata.

@danlamanna
Copy link
Contributor

I'm +1'ing this from the archive side. We've run into places where we're unable to distinguish between a user input error and genuine bugs. I'm indifferent to the approach but a common pattern I've seen is creating a base exception class e.g. DandiSchemaException that would account for any known problems on the dandi-schema side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants