Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract content from git repository #62

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

themightychris
Copy link
Collaborator

This captures some unfinished early work of mine on extracting information about content contained within referenced git repos by fetching the latest commit of the default branch into a scratch repo.

What I ran into and didn't finish resolving before I stopped working on this was that some projects reference Git repos outside GitHub that are down and/or have malformed responses and the git CLI is not good at telling us about this when we try to fetch or ls-remote on them, and just hangs indefinitely in some cases.

There are two things we could/should do next:

  • Add a special case handler for GitHub repositories that uses the GitHub API to fetch information about the latest commit and various files within the repo instead of trying to use the git protocol to pull the latest commit. Even doing a shallow fetch of just the latest commit, pulling via git requires downloading a lot more content and dealing with a lot more failure modes than using the GitHub API. The vast majority of repos are on GitHub, so while we want to ultimately support all sorts of Git hosts, GitHub represents a worthwhile special case to optimize for by using their API instead
  • Make fetching latest commit and content via git more resilient for cases where we can't use the GitHub API. I was thinking we might either:
    • Explore implementing our own timeout in our use of child_process to invoke the local git client to probe repositories
    • OR, and this might be quicker, implementing our own fast HTTP or SSH connection before trying to use git that just quickly checks that a connection can be established and we see some fingerprint in the response that tells us a git server is responding on the other end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant