Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster clone method #22

Open
kornelski opened this issue Apr 3, 2024 · 4 comments
Open

Faster clone method #22

kornelski opened this issue Apr 3, 2024 · 4 comments

Comments

@kornelski
Copy link

Instead of cloning the repo and then looking for a tag, you can use git ls-remote (Remote::create_detached & connect_auth & list in git2) to find a tag and its sha1.

When you have a sha1, you can init an empty repo, and do git fetch +3db7c05aa35749cc4e0f0f892bc5831219901f98:refs/heads/whateverbranch --depth=1 to get just that one commit.

@paolobarbolini
Copy link
Member

This sounds like a great idea, and for long term storage of repos would also help us known when it's time to pull again.

@link2xt
Copy link
Contributor

link2xt commented Apr 3, 2024

Seems git fetch --depth=1 <repo> <commit> into an empty repo works.

I also suggest that we clone from scratch for each crate version: #14

When you have a sha1

We already have sha1 from the downloaded crate

@kornelski
Copy link
Author

I've meant searching tags for crates that lack sha1 from cargo, usually due to having a "dirty" directory when publishing.

@richardscollin
Copy link

Another related idea might be abusing the git-sha1 hash in the git tree objects to avoid having to fetch and hash each file.

This would be relying on the known cyptographically broken sha1, so maybe this defeats the whole point of a tool like this, but it would certainly improve speed and reduce disk usage.

For example for this repo:

❯ cd tmp/                                                                                                                                                                    
collin@xps ~/Git/cargo-goggles/tmp (main +2#3%2)                                                                                                                             
❯ git init                                  
Initialized empty Git repository in /home/collin/Git/cargo-goggles/tmp/.git/
collin@xps ~/Git/cargo-goggles/tmp (master)  
❯ git remote add origin https://github.com/M4SS-Code/cargo-goggles
# get the commit hash from .cargo_vcs_info.json
❯ git fetch --depth=1 --filter=blob:none origin 92433e4a738f76ec1be401980233923f2e86044c
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 5 (delta 0), reused 4 (delta 0), pack-reused 0
Receiving objects: 100% (5/5), 676 bytes | 676.00 KiB/s, done.
From https://github.com/M4SS-Code/cargo-goggles
 * branch            92433e4a738f76ec1be401980233923f2e86044c -> FETCH_HEAD
collin@xps ~/Git/cargo-goggles/tmp (master) 
❯ git ls-tree 92433e4a738f76ec1be401980233923f2e86044c
040000 tree b07426be3c96c9168d94e7cf0dc3284a28846259    .github
100644 blob ea8c4bf7f35f6f77f75d92ad8ce8349f6e81ddba    .gitignore
100644 blob 28252ccb26f961b7d6687e31eb26bc935e7044b0    Cargo.lock
100644 blob 0d02031c146bd44b47a4a1f0ac46b37d5de2c9e0    Cargo.toml
100644 blob 261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64    LICENSE-APACHE
100644 blob e1ae93ef989a7d08f231a6eff61adc9824a7040c    LICENSE-MIT
100644 blob 5622ff48d638dbc80555a766c64295649209a5ff    README.md
100644 blob 8dc2ad7a4f24845ff9dac85b9a879646b43ba6e8    deny.toml
040000 tree 15ff46ac1ec3fe2f28ab5b1ddf51e7d29934788c    src
tar -xOf cargo-goggles-0.0.2.crate cargo-goggles-0.0.2/deny.toml | git hash-object -w --stdin
8dc2ad7a4f24845ff9dac85b9a879646b43ba6e8

Here we can see that the hash of the deny.toml matches. We could walk both trees and compare all files.

A downside would be no way to compare the differences. If there's a mismatch, you'd likely want to request the sources from the git forge anyway to compare the differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants