Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF(workaround): loop through namespaces while validating nwb #1036

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Jun 29, 2022

To overcome problems like presented in dandi/helpdesk#43
this introduces solution proposed by @oruebel in #917 (comment)

Unfortunately there were no release of pynwb with that function yet, so we
are doomed to duplicate code and do it "manually" here for now

Closes #917

To overcome problems like presented in dandi/helpdesk#43
this introduces solution proposed by @orugbel in #917 (comment)

Unfortunately there were no release of pynwb with that function yet, so we
are doomed to duplicate code and do it "manually" here for now

Closes #917
@yarikoptic yarikoptic requested a review from jwodder June 29, 2022 22:06
@yarikoptic yarikoptic mentioned this pull request Jun 29, 2022
dandi/pynwb_utils.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 29, 2022

Codecov Report

Merging #1036 (259ae0d) into master (4d9e548) will increase coverage by 0.00%.
The diff coverage is 83.33%.

@@           Coverage Diff           @@
##           master    #1036   +/-   ##
=======================================
  Coverage   88.36%   88.36%           
=======================================
  Files          69       69           
  Lines        9064     9088   +24     
=======================================
+ Hits         8009     8031   +22     
- Misses       1055     1057    +2     
Flag Coverage Δ
unittests 88.36% <83.33%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandi/pynwb_utils.py 83.67% <83.33%> (-0.19%) ⬇️
dandi/dandiapi.py 87.93% <0.00%> (+0.03%) ⬆️
dandi/organize.py 83.18% <0.00%> (+0.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d9e548...259ae0d. Read the comment docs.

Co-authored-by: John T. Wodder II <[email protected]>
dandi/pynwb_utils.py Outdated Show resolved Hide resolved
dandi/pynwb_utils.py Outdated Show resolved Hide resolved
dandi/pynwb_utils.py Outdated Show resolved Hide resolved
@CodyCBakerPhD
Copy link
Contributor

Hey guys - I just tested the dandi validate CLI call for this branch on the files causing the problems.

It works, but only if I set devel_debug=True at line https://github.com/dandi/dandi-cli/blob/bf-validate-namespaces/dandi/cli/cmd_validate.py#L26

otherwise it throws

2022-06-30 09:35:42,181 [   ERROR] 636982248_ephys.nwb: 1 error(s)
2022-06-30 09:35:42,181 [   ERROR]   Failed to validate 636982248_ephys.nwb: missing a required argument: 'path'
Summary: Validation errors in 1 out of 1 files

Any ideas what might be behind that?

@jwodder
Copy link
Member

jwodder commented Jun 30, 2022

@CodyCBakerPhD I believe that's due to the mistake I pointed out in the comment right above yours.

EDIT: Having seen your comment on that comment now, are you getting the validation error with or without the correction?

@CodyCBakerPhD
Copy link
Contributor

CodyCBakerPhD commented Jun 30, 2022

@CodyCBakerPhD I believe that's due to the mistake I pointed out in the comment right above yours.

EDIT: Having seen your comment on that comment now, are you getting the validation error with or without the correction?

I had corrected the pynwb.validate call prior to any testing with the actual files. =\

@CodyCBakerPhD
Copy link
Contributor

What confuses me is if it were really running into a keyword matching issue like that, with devel_debug=True it should be raising it even in the CLI call, yes? But it only seems to be captured when it's allowed to proceed past https://github.com/dandi/dandi-cli/blob/bf-validate-namespaces/dandi/pynwb_utils.py#L414-L415 to https://github.com/dandi/dandi-cli/blob/bf-validate-namespaces/dandi/pynwb_utils.py#L417

@jwodder
Copy link
Member

jwodder commented Jun 30, 2022

@CodyCBakerPhD How can I obtain a problematic NWB file to test on?

@CodyCBakerPhD
Copy link
Contributor

Files sent your way, hopefully you can replicate the problem I'm running into on the DANDI CLI side.

I had actually already been running into this problem since yesterday on my own attempts to solve this very issue: https://github.com/catalystneuro/dandi-cli/blob/pynwb_validate_cached_schema/dandi/pynwb_utils.py#L337-L344 (albeit using some other dev branches in the works for PyNWB instead of copying the code, which I totally understand is the faster fix)

So if you do figure out why the error was being caught there with devel_debug=False, please let me know for future reference 😀

@jwodder
Copy link
Member

jwodder commented Jun 30, 2022

@CodyCBakerPhD Cannot reproduce. After changing validate() to pynwb.validate(), running dandi validate ~/Downloads/636982248_ephys.nwb gives me:

/Users/jwodder/.local/virtualenvwrapper/venvs/tmp-d5266bf89274a1a/lib/python3.9/site-packages/hdmf/utils.py:577: FutureWarning: DynamicTable.__init__: Using positional arguments for this method is discouraged and will be deprecated in a future major release. Please use keyword arguments to ensure future compatibility.
  warnings.warn(msg, FutureWarning)
2022-06-30 09:51:06,120 [   ERROR] /Users/jwodder/Downloads/636982248_ephys.nwb: 2 error(s)
2022-06-30 09:51:06,120 [   ERROR]   VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing
2022-06-30 09:51:06,121 [   ERROR]   VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32'
Summary: Validation errors in 1 out of 1 files
2022-06-30 09:51:06,122 [    INFO] Logs saved in /Users/jwodder/Library/Logs/dandi-cli/20220630135101Z-71953.log

@CodyCBakerPhD
Copy link
Contributor

@jwodder Very strange, must be something on my end then. Is the dandi validate ~/Downloads/636982248_ephys_no_2.3.0.nwb passing for you at least?

@jwodder
Copy link
Member

jwodder commented Jun 30, 2022

@CodyCBakerPhD Yes. Are you using the latest version of every package?

@CodyCBakerPhD
Copy link
Contributor

@CodyCBakerPhD Yes. Are you using the latest version of every package?

Fresh environment (py 3.9) with this branch and only this branch installed via pip install -e . (inside dandi-cli) 🤔

@CodyCBakerPhD
Copy link
Contributor

Fresh environment (py 3.9) with this branch and only this branch installed via pip install -e . (inside dandi-cli) 🤔

Also worth mentioning I tried on two separate devices (Windows and Mac) and ran into same behavior.

@jwodder
Copy link
Member

jwodder commented Jun 30, 2022

So I was wondering why the lack of pynwb. before validate() wasn't causing the tests to fail, and it appears that this is because none of the NWB files we test on have any namespaces, which means that for ns in namespaces_validate: errors += validate(io=reader, namespace=ns) never runs.

@CodyCBakerPhD @yarikoptic Shouldn't there be some sort of "namespaceless" pynwb.validate() call in this code?

@CodyCBakerPhD
Copy link
Contributor

CodyCBakerPhD commented Jun 30, 2022

@CodyCBakerPhD Shouldn't there be some sort of "namespaceless" pynwb.validate() call in this code?

For that, perhaps consult my attempts here: https://github.com/catalystneuro/dandi-cli/blob/pynwb_validate_cached_schema/dandi/pynwb_utils.py#L337-L344

which in your case would not use the passed boolean flag, but rather if not namespaces_validate just go straight to pynwb.validate(io=reader) without any specified namespaces from the file.

it appears that this is because none of the NWB files we test on have any namespaces

Hmm... namespaces have been automatically cached in NWBFiles for some time now, how were/are they generated? Might want to update that if possible. Of course, if there aren't any cached it will use whatever is present in the environment, likely the most recent to that pynwb version.

Now, as for testing this particular behavior of having a file that is invalid according to non-cached namespaces, but valid according to cached ones, my best suggestion would be to just use these example files (maybe hosting on a datalad set and automatically downloading that in the tests?) since artificially constructing something like that can prove a bit difficult.

@yarikoptic
Copy link
Member Author

@CodyCBakerPhD @yarikoptic Shouldn't there be some sort of "namespaceless" pynwb.validate() call in this code?

I guess the question is best addressed to @rly @oruebel . My understanding that everything should be defined in some namespace with 'core' being just one of those, so I do not think there is any "namespaceless" aspect of validation but I could be wrong. Let's wait for gurus to reply ;)

@jwodder
Copy link
Member

jwodder commented Jun 30, 2022

@yarikoptic Looking at a local coverage report for this branch indicates that lines 361, 369-370, and 386 of commit 0d4fb0c are never run, indicating that NWBHDF5IO.load_namespaces() returns nothing for our test NWBs.

@njmei
Copy link

njmei commented Jun 30, 2022

Many thanks for your work on this all!

On my end, I've modified our IPFX repository to not cache pynwb schemas when modifying our NWB files, reprocessed our NWB files, and can confirm that as of 259ae0d dandi validate and pynwb.validate both now have the same behavior.

Successfully installed dandi-0.41.0+5.g259ae0d
> pip freeze | grep -e pynwb -e hdmf -e h5py -e dandi
dandi @ file:///home/nicholas.mei/Documents/dandi-cli
dandischema==0.7.1
h5py==3.7.0
hdmf==3.3.2
pynwb==2.0.1

> python -m pynwb.validate /allen/programs/celltypes/production/humancelltypes/prod218/Ephys_Roi_Result_636991746/EPHYS_ATTACH_METADATA_QUEUE_1187992202/636991746_ephys.nwb
Validating /allen/programs/celltypes/production/humancelltypes/prod218/Ephys_Roi_Result_636991746/EPHYS_ATTACH_METADATA_QUEUE_1187992202/636991746_ephys.nwb against cached namespace information using namespace 'ndx-mies'.
 - no errors found.

> dandi validate /allen/programs/celltypes/production/humancelltypes/prod218/Ephys_Roi_Result_636991746/EPHYS_ATTACH_METADATA_QUEUE_1187992202/636991746_ephys.nwb
/home/nicholas.mei/miniconda3/envs/dandi-cli/lib/python3.7/site-packages/hdmf/utils.py:577: FutureWarning: DynamicTable.__init__: Using positional arguments for this method is discouraged and will be deprecated in a future major release. Please use keyword arguments to ensure future compatibility.
  warnings.warn(msg, FutureWarning)
2022-06-30 09:53:58,926 [    INFO] /allen/programs/celltypes/production/humancelltypes/prod218/Ephys_Roi_Result_636991746/EPHYS_ATTACH_METADATA_QUEUE_1187992202/636991746_ephys.nwb: ok
Summary: No validation errors among 1 file(s)
2022-06-30 09:53:58,927 [    INFO] Logs saved in /home/nicholas.mei/.cache/dandi-cli/log/20220630165345Z-412515.log

@yarikoptic
Copy link
Member Author

@yarikoptic Looking at a local coverage report for this branch indicates that lines 361, 369-370, and 386 of commit 0d4fb0c are never run, indicating that NWBHDF5IO.load_namespaces() returns nothing for our test NWBs.

nice observation! will check in greater detail later (may be core would need to be explicitly added) or wait for @rly or @oruebel to chime in. After all we don't want to make us stop validating nwb files ;)

@njmei
Copy link

njmei commented Jul 1, 2022

@CodyCBakerPhD Was just wondering since we have an internal deadline to get our NWB files uploaded before July 13th (ideally ever sooner is better), is it safe to try to upload our fixed NWB files to our draft dandiset by using this branch?

@CodyCBakerPhD
Copy link
Contributor

CodyCBakerPhD commented Jul 1, 2022

@CodyCBakerPhD Was just wondering since we have an internal deadline to get our NWB files uploaded before July 13th (ideally ever sooner is better), is it safe to try to upload our fixed NWB files to our draft dandiset by using this branch?

@njmei I don't see why not - assuming you removed the spurious core schemas in a similar way that I did. I'm able to access everything in the file just fine through PyNWB. Perhaps a couple 'Best Practice' suggestions from running the NWBInspector on the file (report is attached; ignore validation errors until I can propagate this kind of fix to the inspector), but nothing that would preclude a simple draft upload (publishing the dandiset might require the latin form of the subject species, though).

636982248_ephys_no_2.3.0.txt

@yarikoptic
Copy link
Member Author

moved to draft since the destiny of this PR and "original" PR in pynwb (NeurodataWithoutBorders/pynwb#1432) remains unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

namespace checks
5 participants