Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design of issue tracking/reporting via GitHub #1595

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented May 22, 2023

To address #863

TODOs:

  • incorporate feedback from Roni

@CodyCBakerPhD
Copy link

Overall I think this looks good; one thing I would strongly recommend is a way to ensure the dandiset owners are alerted whenever an issue is reported

Ideas from the meeting included using the dandi user ID (which is GitHub ID too) to auto subscribe to the repo upon creating or otherwise use the '@' operator pre-populated in the issue template

@yarikoptic yarikoptic mentioned this pull request Sep 21, 2023
@bendichter
Copy link
Member

I agree, this looks good to me.

@yarikoptic
Copy link
Member Author

having discussed this a little more during ODIN with @rly (also ping @magland) -- decided to switch approach a little to

Please chime in with what do you think - may be there is some complication you can see which I have missed.

@bendichter
Copy link
Member

@yarikoptic I agree with the proposed changes and with the justification of those changes.

Copy link
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this could be a good way to handle dandiset-specific issues. I have the following concerns:

  • How do future deployments of dandi-archive deal with this? There won't always be a magical DataLad org filled with one repository per dandiset. Perhaps a better way to do this is simply to use a single issue tracker for all issues (tagged with dandiset ID, perhaps); then, the configuration for a dandi archive deploy can point to a single issue tracker to fulfill this functionality.
  • What are the alternative approaches here? It seems a little bit odd for neuroscientists to use a code-specific issue tracker to report problems with datasets. Is there a different service out there that might integrate more cleanly with the archive?

@bendichter
Copy link
Member

@waxlamp Thanks for the thoughts.

There won't always be a magical DataLad org filled with one repository per dandiset.

Why not? Why do you consider this magic?

Perhaps a better way to do this is simply to use a single issue tracker for all issues

We could do this but we'd need to sort out a few features.

  1. How to associate issues with specific dandisets. (could be done with an issue form I guess)
  2. How to notify the dandiset owners when an issue is filed in relation to that dandiset.
  3. An easy way for users to filter issues by dandiset

Is there a different service out there that might integrate more cleanly with the archive?

There certainly are other ticketing systems out there, but GitHub is free and more familiar to neuroscientists than any other platform. It also has a number of useful features:

  1. We could set up GitHub Actions to automate processing of issues.
  2. We can assign issues to projects to prioritize them
  3. GitHub has a REST API we can use to integrate issues with the main website if we want to

@yarikoptic
Copy link
Member Author

Valid concerns, thanks @waxlamp and thanks @bendichter for the points too!

After all we also have staging dandi and it would need to have "its own" . I think it might be possible to plan for both ie template out the URLs for issues, given e.g. {gh_org} (e.g. dandisets), gh_repo (e.g. empty '' or 'dandisets-staging/' if a single repo), and {dandiset} placeholder

  1. main instance - per dandiset repo like we have gh_org = dandisets, gh_repo = ''
  2. staging - a single repo, e.g. https://github.com/dandi/dandisets-staging : gh_org = dandi, gh_repo = dandisets-staging

But such flexibility might complicate future "tighter" integration, e.g. querying for a number of open issues etc to include that information on DLP.

Probably a more lightweight alternative to aiming for 2 setups would be to concentrate on the proposed one here, and

  1. Setup dandisets-staging organization
  2. Add minimal functionality of establishing a new dandiset repository under a specified organization to dandi-archive itself (now it is part of the dataladification script)
    • it would require specification/storing of a github auth token for each instance.
    • would not be called if no configuration to do so is setup (e.g. in the unittests)
  3. Adjust dataladification script to "tollerate" a repository already existing but empty (I think we actually do not need to do anything since we have existing="reconfigure": https://github.com/dandi/dandisets/blob/HEAD/tools/backups2datalad/adataset.py#L374 ).

@waxlamp WDYT about adding functionality to dandi-archive to call out to GITHUB API to create a new repository upon request to create a new dandiset? Or would you prefer to go "2 possible setups" way? (that would require only 1 time manual setup but more parametrization for each instance)

@magland
Copy link
Contributor

magland commented Oct 24, 2023

So if there are thousands of dandisets, there would be thousands of gh repos under an organization maintained by dandi? Seems like it could become difficult to manage. What about having the author of the dandiset optionally contribute and maintain their own gh repo for this?

@yarikoptic
Copy link
Member Author

So if there are thousands of dandisets, there would be thousands of gh repos under an organization maintained by dandi?

yes. Glorious the time when we get 1000s of nice non-empty datasets! we will celebrate!

Seems like it could become difficult to manage.

with the right tools should be doable and not unprecedented:

What about having the author of the dandiset optionally contribute and maintain their own gh repo for this?

an interesting idea. In principle we can already allow people to link github repos within dandiset metadata.

quick grep already shows a good number of mentionings of github within dataset metadata
dandi@drogon:/mnt/backup/dandi/dandisets$ grep github */dandiset.yaml | grep -v dandi/schema
000008/dandiset.yaml:  url: https://github.com/berenslab/mini-atlas
000016/dandiset.yaml:  NWB files is provided at https://github.com/ttngu207/najafi-2018-nwb/blob/master/notebooks/Najafi-2018_example.ipynb.'
000026/dandiset.yaml:  url: https://biccn.github.io/Quarterly_Submission_Receipts/000026-dashboard.html
000027/dandiset.yaml:  ATM contains only a few files from http://github.com/dandi-datasets/nwb_test_data
000035/dandiset.yaml:  url: https://github.com/berenslab/mini-atlas
000037/dandiset.yaml:  url: https://colleenjg.github.io/
000037/dandiset.yaml:  url: https://github.com/jeromelecoq/allen_openscope_metadata/tree/master/projects/credit_assignement
000037/dandiset.yaml:  url: https://github.com/colleenjg/OpenScope_CA_Analysis
000037/dandiset.yaml:  url: https://github.com/colleenjg/cred_assign_stimuli
000060/dandiset.yaml:  here: \n https://github.com/arsenyf/FinkelsteinFontolan_2021NN"
000064/dandiset.yaml:description: This is data produced by the Soltesz Lab NeuroH5 software (https://github.com/iraikov/neuroh5).
000064/dandiset.yaml:  The data has been converted to NWB using the ndx-simulation-output extension (https://github.com/catalystneuro/ndx-simulation-output).
000064/dandiset.yaml:  url: https://github.com/iraikov/neuroh5
000108/dandiset.yaml:  url: https://biccn.github.io/Quarterly_Submission_Receipts/000108-dashboard.html
000122/dandiset.yaml:  can be found at https://github.com/rob-luke/experiment-fNIRS-tapping.
000122/dandiset.yaml:  url: https://github.com/rob-luke/BIDS-NIRS-Tapping
000122/dandiset.yaml:  url: https://github.com/rob-luke/experiment-fNIRS-tapping
000127/dandiset.yaml:  of the Neural Latents Benchmark: https://neurallatents.github.io.'
000128/dandiset.yaml:  Latents Benchmark: https://neurallatents.github.io.'
000129/dandiset.yaml:  Neural Latents Benchmark: https://neurallatents.github.io.'
000130/dandiset.yaml:  of the Neural Latents Benchmark: https://neurallatents.github.io.'
000138/dandiset.yaml:  Benchmark: https://neurallatents.github.io.'
000139/dandiset.yaml:  Benchmark: https://neurallatents.github.io.'
000140/dandiset.yaml:  Benchmark: https://neurallatents.github.io.'
000165/dandiset.yaml:  url: https://github.com/emilyasterjones/interneurons_modulate_drive
000168/dandiset.yaml:  repository: github
000168/dandiset.yaml:  url: https://github.com/rozmar/jGCaMP8_ground_truth_dataset
000207/dandiset.yaml:  Example code on how to plot this data can be found at https://github.com/rutishauserlab/cogboundary-zheng
000207/dandiset.yaml:  repository: github
000207/dandiset.yaml:  url: https://github.com/rutishauserlab/cogboundary-zheng
000221/dandiset.yaml:  Example codes to plot data is at https://github.com/hidehikoinagaki/InagakiAndChenEtAl2022'
000222/dandiset.yaml:  Code and README can be found at https://github.com/JustinOHare/ICR_2022.git'
000231/dandiset.yaml:  repository: github
000231/dandiset.yaml:  url: https://github.com/cxrodgers/NwbDandiData2022
000402/dandiset.yaml:  url: https://github.com/datajoint/microns_phase3_nda
000404/dandiset.yaml:  https://github.com/pkhanna104/bmi_dynamics_code and archived at https://zenodo.org/record/8006653"
000405/dandiset.yaml:  table.\n\nPre-print DOI: \nhttps://doi.org/10.1101/2022.12.15.520660\n\nGithub:\nhttps://github.com/alexgonzl/TMA\n"
000462/dandiset.yaml:  Scripts used for analysis can be found on https://github.com/seethakris/HPCrewardpaper'
000465/dandiset.yaml:  [Electrode mapping information & Basic analysis codes] Github: https://ytchoe.github.io/'
000469/dandiset.yaml:  provided: \nhttps://github.com/rutishauserlab/workingmem-release-NWB\n"
000469/dandiset.yaml:  url: https://github.com/rutishauserlab/workingmem-release-NWB
000473/dandiset.yaml:  url: https://github.com/PierreLeMerre/Esr1_NPX_code
000483/dandiset.yaml:  url: https://github.com/ucsb-goard-lab/Neurotar-HD-Experiments
000540/dandiset.yaml:  on https://rhythm-n-rodents.github.io/software/.
000554/dandiset.yaml:  [Electrode mapping information & Basic analysis codes] Github: https://ytchoe.github.io/'
000557/dandiset.yaml:  https://ytchoe.github.io/"
000574/dandiset.yaml:  url: https://github.com/janhohenheim/usz-neuro-conversion
000574/dandiset.yaml:  url: https://github.com/janhohenheim/nwb-example
000575/dandiset.yaml:  url: https://github.com/janhohenheim/usz-neuro-conversion
000575/dandiset.yaml:  url: https://github.com/janhohenheim/nwb-example
000576/dandiset.yaml:  url: https://github.com/janhohenheim/usz-neuro-conversion
000576/dandiset.yaml:  url: https://github.com/janhohenheim/nwb-example
000579/dandiset.yaml:  notebook for a tutorial to read and extract information from these NWB files: https://github.com/sytseng/Notebook_for_Dandiset_000579\n\n-
000579/dandiset.yaml:  NWB extension code for custom lab meta data (required for reading NWB files): https://github.com/sytseng/ndx-harvey-swac
000579/dandiset.yaml:  \n\n- Code and tutorials for fitting GLM to neural activity in Tensorflow 2: https://github.com/sytseng/GLM_Tensorflow_2"
000579/dandiset.yaml:  url: https://github.com/sytseng/Notebook_for_Dandiset_000579
000579/dandiset.yaml:  url: https://github.com/sytseng/ndx-harvey-swac
000579/dandiset.yaml:  url: https://github.com/sytseng/GLM_Tensorflow_2
000582/dandiset.yaml:    url: https://github.com/dandi/dandi-cli
000618/dandiset.yaml:  This dataset was prepared using the following script: https://github.com/flatironinstitute/spikeforest/blob/main/devel/dandiset/prepare_dandiset.py
000623/dandiset.yaml:  Git Link: https://github.com/rutishauserlab/bmovie-release-NWB-BIDS'
000625/dandiset.yaml:  email: [email protected]
000630/dandiset.yaml:  Analysis code and extracted features available at https://github.com/AllenInstitute/patchseq_human_L1.
000630/dandiset.yaml:  Feature extraction package available at https://github.com/AllenInstitute/ipfx.'
000678/dandiset.yaml:  url: https://github.com/sjara/uobrainflex/tree/master/hulsey2023

Might be worthwhile working out a complete use case example. FWIW

Cons I see

  • might end up being "more difficult" due to possible various ways they decide to label etc issues. In particular if we were to do some "overall overhaul" (e.g. that consistent labeling etc)
  • it would be for authors to do that, and they likely would not, and users end up without a dandiset specific issues board linked from the dandiset landing page

@bendichter
Copy link
Member

bendichter commented Nov 6, 2023

Does GitHub limit the number of repos in an organization?

Edit: Answer: no. "All organizations can own an unlimited number of public and private repositories." (source)

@yarikoptic
Copy link
Member Author

Ok, seems no further questions/concerns. I have added section on minimal developments to be done on dandi-archive backend, and made it all conditional on having github organization provisioned.

@jwodder please also have a look/provide feedback.

@mvandenburgh mvandenburgh self-requested a review December 4, 2023 15:20
@waxlamp waxlamp self-assigned this Dec 4, 2023
Copy link
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am tentatively ok with this design (we will learn more about its suitability/viability as we go), but I'd like to also consult with others in Kitware who may know of alternatives we haven't covered.

doc/design/dandiset-issues-tracking.md Outdated Show resolved Hide resolved
doc/design/dandiset-issues-tracking.md Outdated Show resolved Hide resolved
doc/design/dandiset-issues-tracking.md Show resolved Hide resolved
That is, each instance will specify which org to use, rather than relying on out-of-band use of DataLad to create the repos.

Co-authored-by: Yaroslav Halchenko <[email protected]>
@yarikoptic
Copy link
Member Author

@bendichter mentioned https://github.com/giscus/giscus which interfaces discussions (not issues), and also that opens the ecosystem of other attempts at similar platforms (https://github.com/gitalk/gitalk, https://github.com/utterance/utterances etc) but they all seems to be "dead" as no devel/support.

@bendichter
Copy link
Member

Discussion may actually be a slightly better option. It psychologically opens up the discussion to messages like questions or explanations rather than just problems. An issue can easily be created from a discussion as well in the GitHub web interface.

@waxlamp
Copy link
Member

waxlamp commented Feb 3, 2024

#1313 mentioned utterances, which builds a comment stream over github issues. That is something to consider as well.


- 2i2c uses github teams for resources management on the hub (ref: 1.1 within https://github.com/dandi/dandi-hub/pull/90/files#diff-82655098d9fb488babf6a5ce10d3d5f6a98d17b2f69de5ca28315e54e020bdf9R29)

- Provide facility to propose changes to dandiset metadata:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attn @bendichter on his request during DANDI/NWB meetup. Can resolve later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

5 participants