Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more specific exceptions/errors than just value_error #158

Open
yarikoptic opened this issue Feb 8, 2023 · 8 comments
Open

Provide more specific exceptions/errors than just value_error #158

yarikoptic opened this issue Feb 8, 2023 · 8 comments

Comments

@yarikoptic
Copy link
Member

ATM we use ValueError as a generic exception for when value is not the one we like to have

❯ git grep 'raise ValueError'
dandischema/datacite.py:                        raise ValueError(
dandischema/digests/dandietag.py:            raise ValueError("File is larger than the S3 maximum object size.")
dandischema/digests/dandietag.py:            raise ValueError("Not all part hashes submitted")
dandischema/digests/dandietag.py:            raise ValueError("Digesting new part when current part is not complete")
dandischema/digests/dandietag.py:            raise ValueError("Partial update extended past end of file")
dandischema/digests/zarr.py:        raise ValueError(f"Cannot parse directory digest {digest}")
dandischema/digests/zarr.py:        raise ValueError("Not found")
dandischema/digests/zarr.py:        raise ValueError("Cannot compute a Zarr checksum for an empty directory")
dandischema/metadata.py:        raise ValueError("Provided object has no known schemaKey")
dandischema/metadata.py:        raise ValueError(
dandischema/metadata.py:                raise ValueError(
dandischema/metadata.py:        raise ValueError(f"Current target schemas: {ALLOWED_TARGET_SCHEMAS}.")
dandischema/metadata.py:        raise ValueError(f"Current input schemas supported: {ALLOWED_INPUT_SCHEMAS}.")
dandischema/metadata.py:        raise ValueError(f"Cannot migrate from {schema_version} to lower {to_version}.")
dandischema/metadata.py:                    raise ValueError("Cannot auto migrate. SchemaKey missing")
dandischema/metadata.py:        raise ValueError("Provided metadata has no schema version")
dandischema/metadata.py:        raise ValueError(
dandischema/models.py:        raise ValueError(f"Could not generate a klass or items from {data}")
dandischema/models.py:            raise ValueError(
dandischema/models.py:            raise ValueError(
dandischema/models.py:            raise ValueError("Both identifier and url cannot be None")
dandischema/models.py:            raise ValueError(
dandischema/models.py:            raise ValueError("At least one contributor must have role ContactPerson")
dandischema/models.py:                raise ValueError("A zarr asset must have a zarr checksum.")
dandischema/models.py:                raise ValueError("Digest cannot have both etag and zarr checksums.")
dandischema/models.py:                raise ValueError(
dandischema/models.py:                raise ValueError(
dandischema/models.py:                raise ValueError("A non-zarr asset must have a dandi-etag.")
dandischema/models.py:                raise ValueError("Digest cannot have both etag and zarr checksums.")
dandischema/models.py:                raise ValueError(
dandischema/models.py:            raise ValueError(
dandischema/models.py:                raise ValueError("A non-zarr asset must have a sha2_256.")
dandischema/models.py:                raise ValueError(
dandischema/tests/test_models.py:            raise ValueError(f"{qname},{klass} already exists {qnames[qname]}")
dandischema/utils.py:        raise ValueError(r"Version must be well formed: \d+\.\d+\.\d+")

and some of them in particular whenever we indicate that the value is required! (not sure why we make it optional then to start with? didn't check)

❯ git grep 'must have a' | grep -v appropriate
dandischema/models.py:                raise ValueError("A zarr asset must have a zarr checksum.")
dandischema/models.py:                raise ValueError("A non-zarr asset must have a dandi-etag.")
dandischema/models.py:                raise ValueError("A non-zarr asset must have a sha2_256.")
dandischema/tests/test_models.py:        "A non-zarr asset must have a sha2_256." in el["msg"]
dandischema/tests/test_models.py:            "A zarr asset must have a zarr checksum." in val
dandischema/tests/test_models.py:            "A zarr asset must have a zarr checksum." in val

That results that whenever we catch ValidationError in dandi-cli we cannot really tell if it is the error that value is completely absent or something else without matching a message (unreliable, shouldn't be done):

*(Pdb) p e.errors()
[{'loc': ('digest',), 'msg': 'A zarr asset must have a zarr checksum.', 'type': 'value_error'}]

I guess (didn't check) if we specialize to other types of derived from ValueError exceptions, like MissingValue(ValueError) we might get 'type': 'missing_value' ?

@satra
Copy link
Member

satra commented Feb 8, 2023

is there a specific list you would like that you would do something programmatic with (i.e. the semantics of typing are useful to you somewhere)? if it's primarily for humans, we should just go through the messages and improve them. also if you simply want to know its from dandischema, we can create a special ValueError. i think subtyping exceptions creates technical debt, unless it is useful in some form.

@yarikoptic
Copy link
Member Author

So far I see clear distinction only for MissingValue (it was not even provided! first thing to look at). yet to see in detail/compare to how it looks like whenever it is really missing according to pydantic schema itself.
Then useful to harmonize to see how it looks like whenever value is not corresponding to how pydantic reacts that value is not satisfying what it expects.

As for why -- we are taking that type as a part of a ValidationResult ID. We later might group based on what is missing vs what is entered but incorrectly etc. So it then would become not only about display but also about being able to tell one from another.

@danlamanna
Copy link
Contributor

Related to dandi/dandi-archive#1490.

@satra
Copy link
Member

satra commented Feb 15, 2023

@yarikoptic - most of the exceptions come directly from pydantic. are you only saying to have a MissingValueError or match it to pydantic's validationerror, which is based on message. or create a whole scheme of exceptions. if that latter, please describe which pydantic message should map to what exception.

@yarikoptic
Copy link
Member Author

What matters for me is that 'type': 'value_error' within ValidationError.errors() returns. I want it to be missing_value_error. I don't know really what it entails within dandi-schema/pedantic, but I thought the first step could be to derive custom exception.
With dandi/dandi-archive#1490 by @danlamanna in mind, I would say it is worth creating base class of DandiSchemaError, and sub/multi-class

class DandiValueError(ValueError, DandiSchemaError):
    pass

class DandiMissingValueError(DandiValueError):
    pass

and throughout dandi-schema code base use DandiValueError instead of plain ValueError so anyone catching exceptions from us could tell some other ValueError etc apart from the ones we raise.

@satra
Copy link
Member

satra commented Feb 16, 2023

can we just leave it as pydantic? i can replace where we generate ValueError with a pydantic ValidationError object.

In [12]: try:
    ...:     Dandiset()
    ...: except ValidationError as e:
    ...:     print(e.__repr__())
    ...: 

ValidationError(model='Dandiset', errors=[{'loc': ('id',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('name',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('description',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('contributor',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('license',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('identifier',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('citation',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('assetsSummary',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('manifestLocation',), 'msg': 'field required', 'type': 'value_error.missing'}, {'loc': ('version',), 'msg': 'field required', 'type': 'value_error.missing'}])

@satra
Copy link
Member

satra commented Feb 16, 2023

we will have to also decide what we do with both JSON validation and pydantic validation, which are both created in the validation function.

@yarikoptic
Copy link
Member Author

For my current need having 'value_error.missing' I believe would suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants