Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download: add --keep-leading-path|-k or some mode for --download #1460

Closed
yarikoptic opened this issue Jul 18, 2024 · 13 comments · Fixed by #1467
Closed

download: add --keep-leading-path|-k or some mode for --download #1460

yarikoptic opened this issue Jul 18, 2024 · 13 comments · Fixed by #1467

Comments

@yarikoptic
Copy link
Member

Inspired by

ATM whenever we download an asset, we download file into current directory (besides if path is a glob as I showed in #1474). That makes it inconvenient to replicate partial file tree hierarchy as present on the archive. I think we just need an option which would maximally simplify for a user a use case where he/she needs to edit/reupload some existing files. So for a URI which points to a dandiset_id and an asset_path(s), if such option is provided

  • if not yet in a dandiset, create leading directory dandiset_id/, download dandiset.yaml, next operate as if from that dandiset_id/ folder
  • download asset(s) under asset_path(s) creating all leading directories as necessary.

I wonder though if should be a separate option or some additional "mode switch" for

  --download [dandiset.yaml,assets,all]
                                  Comma-separated list of elements to download
                                  [default: all]

but failing to come up with a descriptive enough value (dandiset+asset?)

@jwodder
Copy link
Member

jwodder commented Jul 18, 2024

@yarikoptic Why should this option cause dandiset.yaml to be downloaded as well? Downloading an individual asset currently doesn't do that.

@jwodder
Copy link
Member

jwodder commented Jul 18, 2024

@yarikoptic Also, why should the download path start with {dandiset_id}/? Downloads of other asset URL types (globs & folders) don't have that.

@jwodder
Copy link
Member

jwodder commented Jul 18, 2024

@yarikoptic Also, should this option have a similar effect on downloads of multi-asset URLs? For example, currently, downloading just a foo/bar/baz/ directory will save everything under baz/, where this option could be used to instead save things under foo/bar/baz/.

@yarikoptic
Copy link
Member Author

@yarikoptic Why should this option cause dandiset.yaml to be downloaded as well? Downloading an individual asset currently doesn't do that.

why: convenience. That is why I wondered if it should be a new mode dandiset.yaml+asset or alike. So it is not really for lower level code, just top level convenience.

@yarikoptic Also, why should the download path start with {dandiset_id}/? Downloads of other asset URL types (globs & folders) don't have that.

again: convenience, abd because dandiset.yaml+asset -- we do create {dandiset_id}/ when --download dandiset.yaml.

@yarikoptic Also, should this option have a similar effect on downloads of multi-asset URLs?

yes

For example, currently, downloading just a foo/bar/baz/ directory will save everything under baz/, where this option could be used to instead save things under foo/bar/baz/.

correct, we want to retain all leading paths. In case of multiple URLs, possibly multiple different folders, or even {dandiset_id}/ leading directories if we are not under a dandiset already.

@kabilar
Copy link
Member

kabilar commented Jul 18, 2024

Thank you @yarikoptic and @jwodder. All of the options you describe above will be very helpful. I have needed this and several users have requested this feature.

@jwodder
Copy link
Member

jwodder commented Jul 19, 2024

@yarikoptic Rather than adding a dandiset.yaml+asset mode, wouldn't it make more sense to change the all mode to have this new behavior and reserve the old behavior for just asset mode? I realize this is a breaking change, but semver lets us do that for 0.x.0 releases.

@yarikoptic
Copy link
Member Author

wouldn't it make more sense to change the all mode to have this new behavior

I think current behavior is ok for all, as if you specify just a URL to a file it is IMHO more reasonable to just store that file without leading paths etc, and overall it does not impose much of checks/assumptions. The dandiset.yaml+asset mode would be non-default convenience to facilitate uses cases to modify (download/reupload) the file(s).

@jwodder
Copy link
Member

jwodder commented Jul 22, 2024

@yarikoptic New suggestion: We implement this as a new --download mode, but call it "structure" or similar (as it preserves the directory structure and, by downloading dandiset.yaml, the Dandiset structure) instead of the confusing & unwieldy "dandiset.yaml+asset."

@kabilar
Copy link
Member

kabilar commented Jul 23, 2024

@yarikoptic New suggestion: We implement this as a new --download mode, but call it "structure" or similar (as it preserves the directory structure and, by downloading dandiset.yaml, the Dandiset structure) instead of the confusing & unwieldy "dandiset.yaml+asset."

Thanks @jwodder @yarikoptic. I like this idea, and think that the --download mode of structure or tree would be intuitive.

Based on the discussion in #1464, we may also need to include the dataset_description.json file to handle the above use case of download/edit/reupload.

@kabilar
Copy link
Member

kabilar commented Jul 23, 2024

cc @aaronkanzer @dstansby @balbasty

The team is working to replicate the partial file tree hierarchy upon download to handle the use case of editing existing files.

@yarikoptic
Copy link
Member Author

@yarikoptic New suggestion: We implement this as a new --download mode, but call it "structure" or similar (as it preserves the directory structure and, by downloading dandiset.yaml, the Dandiset structure) instead of the confusing & unwieldy "dandiset.yaml+asset."

I am ok with an alternative name. Nothing better than "structure" comes to mind ATM -- would be easy to change during PR review if some better choice strikes someone.

@yarikoptic
Copy link
Member Author

Based on the discussion in dandi/dandi-archive#1464, we may also need to include the dataset_description.json file to handle the above use case of download/edit/reupload.

For this one I think we might want first to consider some alternatives, e.g. explicit --dataset-layout auto,bids,dandi option or something like that. dandiset.yaml is needed to deduce root of the dandiset and its id.

yarikoptic added a commit that referenced this issue Jul 24, 2024
Add `--preserve-tree` option to `dandi download`
Copy link

github-actions bot commented Aug 8, 2024

🚀 Issue was released in 0.63.0 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants