Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section describing how to migrate a git-annex repo to datalad #890

Open
lukas-mertens opened this issue Dec 14, 2022 · 4 comments
Open

Section describing how to migrate a git-annex repo to datalad #890

lukas-mertens opened this issue Dec 14, 2022 · 4 comments

Comments

@lukas-mertens
Copy link

Is your feature request related to a problem?

As probably many others as well, I stumbled on datalad because my git-annex ran into scalability-issues: Too many files. Datalad is the perfect tool to split it up into multiple subdatasets.

Describe the solution you'd like

It would be nice to have a guide on how to migrate a git-annex repo and split it up into multiple subdatasets. What would be best-practices?

Describe alternatives you've considered

No response

Additional context

No response

@welcome
Copy link

welcome bot commented Dec 14, 2022

Welcome Banner (Image: CC-BY license, The Turing Way Community, & Scriberia. Zenodo. http://doi.org/10.5281/zenodo.3332808) Hi there, and welcome to the DataLad Handbook! 📙 👋 Thank you for filing an issue. We're excited to have your input and welcome your idea! 😊 If you haven't done so already, please make sure you check out our Code of Conduct.

@adswa
Copy link
Contributor

adswa commented Dec 14, 2022

Hi, thanks for opening this issue!
I agree this would be cool to have. A first hurdle to be addressed beforehand is a datalad command that achieves this, which doesn't yet exist. However, its been an idea for a while (datalad/datalad#3554, datalad/datalad#600), and there were recent use cases which made this a quite desired development target. The linked issues have some pointers and demos that at least sketch how its in principle possible with all the underlying tools, and I suspect that a to-be-created new command could land in datalad-next - I'll be sure to update this issue if that happens, but feel free to chime in or bump the discussions in those issues in the core repository, too. :)

@bpoldrack
Copy link
Contributor

Just want to add an explicit pointer to datalad copy-file here, in case you are not aware, @lukas-mertens. It's not solving the general problem, but it may well be good enough for you to assemble a new dataset from existing ones.

@lukas-mertens
Copy link
Author

@bpoldrack Exactly what I am using now, however I am now running into a performance problem which I wrote about in datalad/datalad#7038

I am currently debugging it, thanks a lot anyways!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants