Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RemoteDandiset.has_data_standard() convenience function #958

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

bendichter
Copy link
Member

@jwodder, what do you think of something like this?

@bendichter bendichter requested a review from jwodder April 13, 2022 16:59
dandi/dandiapi.py Outdated Show resolved Hide resolved
dandi/dandiapi.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Apr 13, 2022

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (75b2a39) 89.04% compared to head (a994e54) 88.55%.
Report is 265 commits behind head on master.

Files Patch % Lines
dandi/dandiapi.py 37.50% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #958      +/-   ##
==========================================
- Coverage   89.04%   88.55%   -0.50%     
==========================================
  Files          76       77       +1     
  Lines        9698    10490     +792     
==========================================
+ Hits         8636     9289     +653     
- Misses       1062     1201     +139     
Flag Coverage Δ
unittests 88.55% <76.19%> (-0.50%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bendichter bendichter requested a review from jwodder April 13, 2022 21:04
dandi/dandiapi.py Outdated Show resolved Hide resolved
Co-authored-by: John T. Wodder II <[email protected]>
dandi/dandiapi.py Outdated Show resolved Hide resolved
@bendichter
Copy link
Member Author

@yarikoptic before I write tests for this, how does this syntax look?

else:
raise ValueError(
f"'data_standard' must be an RRID (of form 'RRID:XXX_NNNNNNN`) or one "
f"of the following values: {DATA_STANDARD_MAP.keys()}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict.keys() doesn't stringify in a human-friendly way. Use ", ".join(DATA_STANDARD_MAP.keys()) instead.

This is determined by checking for "RRID:SCR_015242" in the "dataStandard" field
of the assetsSummary of the dandiset.
Returns True if the Dandiset contains one or more files of the indicated
standard. Otherwise, returns False.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should document the accepted data standards (or, at the very least, the RRID input format).

dandi/dandiapi.py Outdated Show resolved Hide resolved
dandi/dandiapi.py Outdated Show resolved Hide resolved
Co-authored-by: Yaroslav Halchenko <[email protected]>
f"'data_standard' must be an RRID (of form 'RRID:XXX_NNNNNNN`) or one "
f"of the following values: {", ".join(DATA_STANDARD_MAP.keys())}"
)
assets_summary = self.get_raw_metadata()["assetsSummary"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, assetsSummary seems to be non optional, so must be present in a valid metadata record, but if not we would get KeyError here... which could be good as to alert user that smth is wrong with metadata in that dandisets. But should it be an exception really or just a warning? I would lean to a be a warning and just return False. WDYT @jwodder and @satra ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwodder and @bendichter WDYT - better be strict or robust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with changing this to throw a warning if assetsSummary is missing

@yarikoptic
Copy link
Member

@yarikoptic before I write tests for this, how does this syntax look?

looks ok to me ;) left few insignificant comments

@yarikoptic yarikoptic marked this pull request as draft February 22, 2023 17:39
@yarikoptic
Copy link
Member

@bendichter I have converted PR back to draft since needs more work. some comments are left above, and also would be nice to add some rudimentary unittest. Overall I think it would be a nice addition, so would be great to see if finished.

@bendichter
Copy link
Member Author

bendichter commented Mar 6, 2023

I could use some guidance in how to write a test for this. Can I just use published dandisets on the archive?

@jwodder jwodder changed the title add RemoteDandiset.is_nwb() convenience function add RemoteDandiset. has_data_standard() convenience function Mar 7, 2023
@yarikoptic yarikoptic requested a review from jwodder March 7, 2023 19:47
@jwodder
Copy link
Member

jwodder commented Mar 7, 2023

@bendichter Could you rebase or make an empty commit or something in order for the CI to run again?

@jwodder
Copy link
Member

jwodder commented Mar 7, 2023

@bendichter

Can I just use published dandisets on the archive?

That's one option. Normally, I'd suggest just using the nwb_dandiset and text_dandiset fixtures, but that would require the Dockerized server to produce the assetsSummary immediately, and I don't know how long that takes.

@bendichter bendichter marked this pull request as ready for review March 8, 2023 00:37
@bendichter
Copy link
Member Author

ok, I added some tests. Let me know if these will work

@jwodder jwodder changed the title add RemoteDandiset. has_data_standard() convenience function Add RemoteDandiset.has_data_standard() convenience function Mar 8, 2023
@jwodder
Copy link
Member

jwodder commented Mar 8, 2023

@bendichter Please run pre-commit on your code.

@bendichter
Copy link
Member Author

@jwodder oh, I see. Is there a reason to not set the pre-commit hooks as GitHub Actions? We do this for NeuroConv and it works great. All PRs are automatically Black formatted, so there is no need for developers to worry about installing and running pre-commit.

dandi/dandiapi.py Outdated Show resolved Hide resolved
@yarikoptic
Copy link
Member

I think it is still a legit and potentially useful interface. @bendichter - what do you think? it might benefit from adhering to the same design principle of using Enum's instead of free strings, as RFed into by #1357 yesterday

@bendichter
Copy link
Member Author

@yarikoptic what do you mean by Enum? Do you mean using Literal in the typehints?

@jwodder
Copy link
Member

jwodder commented Nov 30, 2023

@bendichter He means using an enum type instead of strings so that Python and mypy will both complain if a name is typoed and passing an unsupported name becomes impossible. Look at 2f38f8e#diff-a0dcf236f5c18d66953b7985098c75c21bbc9dd6b7e6e7063d5d38c2044f6697.

@bendichter
Copy link
Member Author

OK, that looks like a fine approach. I'm not able to work on this right now. Feel free to take this over and implement Enums if you want to push this through now, otherwise, I can get to this next week

Co-authored-by: Yaroslav Halchenko <[email protected]>
if assets_summary is None:
warnings.warn(
f"The raw metadata of RemoteDandiset {self.identifier} does not contain 'assetsSummary'. "
f"Assuming that it does not contain {data_standard}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bendichter You do not appear to be using pre-commit; if you were, I'm quite certain black would have moved that closing parenthesis to the next line. Please install pre-commit for the repository and run it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwodder in this particular case, I accepted @yarikoptic 's proposed change in GitHub. I'll make sure pre-commit is installed and running properly for future PRs

@yarikoptic
Copy link
Member

OK, that looks like a fine approach. I'm not able to work on this right now. Feel free to take this over and implement Enums if you want to push this through now, otherwise, I can get to this next week

:letsdoit:! @bendichter -- would you have time/effort to finish it up?

@yarikoptic yarikoptic marked this pull request as draft October 10, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants