Do these statements describe your situation?
- I want to release a big multi-author code-base
- I dont want people to see my intermediate versions (like they could if I released the full repo history)
- Usually id just squash the whole repo into one commit...
- ...but this erases who wrote what line!
- I have thoroughly backed up all my files
Then Git Line-Credit Hack is for you!
What this tool does:
- File by file, author by author, reintroduce all the changes by that author and commit them
- Preserve line-by-line commit credit
- Preserve total lines by each author
Many things this tool does not do:
- Anything particularly clever with git, its just python file ops and various
git blame, git add, git commands. - Preserve the number of commits. There will be O(n_files * n_auth_per_file) commits.
- Make intermediate versions which are usable. All commits except the last one will be incomplete
- Preserve WHEN contributions were made. All will show up as if they were made today.
- Preserve any contributions which were not the last edit made to their particular line. Last editor gets all the credit.
- Make verified commits. It puts other people's names and email on the commits generated by you. You should get permission from everyone involved. Github will credit their account for them, but they will not be 'Verified'.
This tool has been used 2-3 times in the wild, but often requires some extra manual hacking, especially if the git history is messy.
It is intended more as a starting point for you to figure out a workflow that works for your repo.
Install python3 and git. No python libraries needed.
Edit config.py so it has the correct emails / aliases etc for your project. You may need to iterate it a few times when you run the dryrun commands below. The goal of these configs is to determine a single unique name / email for each author, since people often accidentally use different ids on different devices.
Note: this tool will ignore lots of different types of files it cant handle. Run with --debug to see which ones. Generally these may end up as uncommitted changes or unmodified files at the end of the process.
Before proceeding, edit your git config email and name to reflect some neutral lab/company account, since this account will be credited as a coauthor for all commits. You will need to add an ssh-key for this account if you have not already
git config --global -e
Clone copies of both your internal repo and the public repo.
# dryrun
python git_commit_hack.py INTERNAL_REPO PUBLIC_REPO
# once the result seems good to you and you have made backups
python git_commit_hack.py INTERNAL_REPO PUBLIC_REPO --dryrun_mode change_files,commit_changes
# Any uncommitted files youll need to clean uo yourself
git status
# When ready to release to public
git push origin main
Clone two copies of your private repo. Choose one that you want to end up public.
cd PUBLIC_REPO
git checkout --orphan new_main
# dryrun
python git_commit_hack.py --fresh_start INTERNAL_REPO PUBLIC_REPO
# once the result seems good to you and you have made backups
python git_commit_hack.py --fresh_start INTERNAL_REPO PUBLIC_REPO --dryrun_mode change_files,commit_changes
# BE CAREFUL - this will overwrite main on local and remote
git branch -D main
git checkout -b main
git remote add <PUBLIC_REPO_LINK> publicrelease
git push publicrelease main