-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validating assets calls github repeatedly #153
Comments
@danlamanna - we could probably package the schemas with dandi-schema. an alternative would be to cache the request on the server side. are the requests all in isolated processes or would a cache to keep a schema once downloaded work? it's also the case that this download happens when an asset is using a different schema than the current one. this is true for many assets currently that were submitted a while back, but should not in theory be true for new assets being uploaded. i.e. the schema version should be the latest. we have been planning to run a metadata update by processing the files with the latest extractor, but this hasn't been rolled into action. |
Avoiding network requests altogether would be best for maximizing reliability. A cache combined with giving the caller control over how network requests are performed (timeouts, retries, etc) would be the next best option. |
dandischema in general requires access to online resources to carry out it's general work, so it will never be a network free library. but we can optimize it in some ways. we didn't want to make assumptions about availability of storage, persistence etc when we wrote that component, but i can try a few changes. @djarecka and @sooyounga - is this something you folks could take a stab at? happy to discuss details. |
@satra, Dan's idea has a lot of merit: even if the goal is to always be validating against the newest schema version, we are not there yet, and keeping the allowed schema versions as static package data would gain us an immediate and obvious win (while we are still litigating, so to speak, schema autoupgrades etc.). Dan can create a quick proof of concept so we can observe the benefits/drawbacks of the approach. He can coordinate this idea with whatever Dorota and Sooyoung are looking into as well. |
@waxlamp - i have no issues with a proof of concept. |
When calling
dandischema.metadata.validate
on n assets, n requests are made to github to fetch the schema. This makes validating assets take significantly longer than it should. The request also has no default timeout, meaning a call tovalidate
can hang indefinitely.dandi-schema/dandischema/metadata.py
Lines 184 to 187 in d34658c
Can
dandi-schema
be modified to avoid relying on the network for validation? Either by bundling the schemas fromdandi/schema
into package data, allowing the caller ofvalidate
to pass a schema directly, or some other means?FWIW this problem appears to exist with
migrate
as well.The text was updated successfully, but these errors were encountered: