Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config validation via json-schema #1547

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

lkstrp
Copy link
Member

@lkstrp lkstrp commented Feb 18, 2025

Closes #1514

Changes proposed in this Pull Request

As discussed in #1514:

  • Uses JSON Schema to validate snakemake configuration
  • Allows to enforce types, values, ranges, sets, etc. and everything that is possible with JSON Schema
  • Removes doc/configtables and integrates the schema into the documentation instead.
    • This will reduce duplicate maintenance and lead to more up to date/cleaner config documentation.
    • See Docs Build
  • While JSON schema allows keys to be deprecated, the snakemake implementation does not currently check for this. I'll either add an own implementation based on the schema or raise a snakemake PR.
  • For now, I would mark all keys as required (assuming the default config contains all possible configurations). At some point we could discuss making some optional, but that would require logical changes.
  • And of course at some point we can move a lot of actual checks there. Not just types. But I would leave that for another PR.

This just implements the schema for the top level configuration. If you agree, I will add this for everything else

Examples

Type check:

$ snakemake
Error validating config file.
ValidationError: 'not_a_bool' is not of type 'boolean'

Failed validating 'type' in schema['properties']['tutorial']:
    {'title': 'Tutorial',
     '$$target': ['#/tutorial'],
     'type': 'boolean',
     'description': 'Switch to retrieve the tutorial data set instead of '
                    'the full data set.'}

On instance['tutorial']:
    'not_a_bool'

Simple enum check:

$ snakemake
Error validating config file.
ValidationError: 'INFOO' is not one of ['DEBUG', 'INFO', 'WARNING', 'ERROR']

Failed validating 'enum' in schema['properties']['logging']['properties']['level']:
    {'type': 'string',
     'enum': ['DEBUG', 'INFO', 'WARNING', 'ERROR'],
     'description': 'Restrict console outputs to all infos, warning or '
                    'errors only.'}

On instance['logging']['level']:
    'INFOO'

Checklist

  • I tested my contribution locally and it works as intended.
  • Code and workflow changes are sufficiently documented.
  • Update Templates
  • A release note doc/release_notes.rst is added.

@fneum
Copy link
Member

fneum commented Feb 18, 2025

Ooh, this looks very nice in the documentation! I can see how this will drastically improve maintenance of documentation!

https://pypsa-eur--1547.org.readthedocs.build/en/1547/configuration.html#top-level-configuration

@FabianHofmann
Copy link
Contributor

this is lovely!

@FabianHofmann
Copy link
Contributor

I have the feeling this could be a super solid entry point for a dashboard/GUI from where you can trigger snakemake with a given set of configs

@lkstrp
Copy link
Member Author

lkstrp commented Feb 18, 2025

I have the feeling this could be a super solid entry point for a dashboard/GUI from where you can trigger snakemake with a given set of configs

Saame 💃🚀

I just realized that there are already optional keys which are not in the default (e.g. #1532). Do the config tables then cover everything?

@FabianHofmann
Copy link
Contributor

I just realized that there are already optional keys which are not in the default (e.g. #1532). Do the config tables then cover everything?

that's also new to me (or I forgot about it). I would prefer a default config where all supported config are listed, just to have the visibility, but that's another issue it seems.
I would not necessarily bet on configtables covering all, but most of the entries.

@fneum
Copy link
Member

fneum commented Feb 18, 2025

I agree with @FabianHofmann. I think even almost all possible config settings are covered in config.default.yaml. Only some config which are implemented in a way that their settings can be dictionaries (e.g. with carriers as keys) are not complete.

@lkstrp
Copy link
Member Author

lkstrp commented Feb 21, 2025

This gets better and better 💃

I changed a couple of things:

  • jsonschema library is used instead of the snakemake validation. This allows for DeprecatedErrors when deprecated entrys are used.
    • Instead of adding that to _helpers.py, I also started to move that to lib, where additional l linter settings apply. Maybe we should discuss that structure/ naming first. I can also move it back to helpers.
  • The schema is now in a single file. Which makes it a bit more overwhelming. But allows for easier integration.

We can use the schema not only for validation and documentation, but also for nice autocompletion and tooltips in your IDE:

SPDx License Identifier

I need to check what is the best way to add this for everyone. Either docs explaining how to set this up locally or registering it in the JSON schema store and then it should pop up automatically in most editors.

Also merging those configtables into the schema is quite a pain. I am halfway through. Either we add the rest in another PR or this stays open a bit longer. In any case you can review already.

EDIT: Before merging we should test this against a couple of used configurations. There are false positives already. Or make the validation optional for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants