Skip to content

Git Workflow

Anita Okoh edited this page Mar 11, 2024 · 2 revisions

Git workflow

Commit messages

Commit messages should follow the rules in this classic article:

  1. Separate subject from body with a blank line
  2. Limit the subject line to 50 characters
  3. Capitalize the subject line
  4. Do not end the subject line with a period
  5. Use the imperative mood in the subject line
  6. Wrap the body at 72 characters
  7. Use the body to explain what and why vs. how

Rationale: consistent messages are an easier read, and less is more.

Example

From my existing branch on this project:

6b8bb80 Fix most ruff issues
c2add98 Accept auto fixes proposed by ruff
963644e Blacken python
ff4ab0c Add black config to pyproject.toml
07f130d Add lightning_logs/ to .gitignore
99dbc87 Teach superduperdb about poetry
bc798b8 Add .direnv/ and .envrc to .gitignore

Everything is fairly self-explanatory. Perhaps you have to look up what ruff or poetry are, but in this case, even if you don’t understand the project, you should have some idea of what’s going on. Not all commits will be like that, some of them will assume considerable knowledge about the project, but the criterion is something like, “Should be obviously to someone who is actually interested in this project”.

Many of these commits are tiny strokes!, but the point is that they are obviously correct.

Note the “imperative mode” - Add, Teach, Accept and not Added, Taught, Accepted, etc - because it’s less typing and less conjugation.

I managed to fit all the information into the subject, so I never needed a body, but this won’t always be true by any means. Even very long commit bodies are possible and occasionally needed.

Work in forks, main is truth

Sharing a repository with other developers is like s̶h̶a̶r̶i̶n̶g̶ ̶a̶ ̶t̶o̶o̶t̶h̶b̶r̶u̶s̶h̶ working in the same directory at the same time: it’s less stressful if everyone works in their own fork.

Quick definitions: a git repository is a collection of git commits (represented by git commit IDs), names of commits (like branches and tags) and the associated files.

To fork a repository is to make a copy of those commits, names and files into a new repository. A clone is a copy of a repository onto disk.

In this project, there is one main fork, with one key branch, named SuperDuperDB/superduperdb/main or just main . main is the single source of truth, and is updated frequently.

Developers work in disposable, short-term branches in their own fork <your-github-name>/SuperDuperDB/<your-branch-name> but stays very close to main with frequent small rebases, mostly without manual intervention. Pulling small commits frequently reduces risk and is good for rapid development, and it’s also more fun.

No merge commits, no automatic squashes, we rely on our git rewriting skills as a team and a little tooling. In the beginning, we won't even be above the occasional rewriting of history on main to correct spelling mistakes and the like, though as we progress this will become more and more taboo.

How to create a fork

  1. Go here: https://github.com/SuperDuperDB/superduperdb
  2. Click on Fork near the upper right
  3. Create a fork under your own personal GitHub account

Now there are two cases.

If you do not have an existing clone, go to some convenient parent directory and type:

git clone [email protected]:SuperDuperDB/superduperdb.git
cd superduperdb
git remote add upstream [email protected]:SuperDuperDB/superduperdb.git 
git fetch

Or, if you do have a clone of the main repository on your disk already, Duncan, here’s how to fix it to point to your personal fork.

  • cd to that directory
  • git remote -v

You should see:

origin	[[email protected]](mailto:[email protected]):SuperDuperDB/superduperdb.git (fetch)
origin	[[email protected]](mailto:[email protected]):SuperDuperDB/superduperdb.git (push)

If so, continue by entering:

git remote set-url origin [[email protected]](mailto:[email protected]):<your-name>/superduperdb.git
git remote add upstream [[email protected]](mailto:[email protected]):SuperDuperDB/superduperdb.git
git fetch

You should see something like:

From [github.com](http://github.com/):<your-name>/superduperdb
 - [deleted]         (none)     -> origin/bug/67/fix-demo
 - [deleted]         (none)     -> origin/feature/21/support-for-ray-serving
 - [deleted]         (none)     -> origin/feature/33/openai
[... more ...]

(Don’t worry, nothing is actually deleted.)

Useful git configs

I suggest adding these git configs before you go on.

In a shell in the superduperdb directory, type:

git config merge.rebase true
git config pull.rebase true

Reason: these prevents you from ever creating a merge commit when using git-pull or git-merge.

git config --global rebase.autosquash true

Reason: https://fle.github.io/git-tip-keep-your-branch-clean-with-fixup-and-autosquash.html

git config --global rerere.enabled true

Reason: https://git-scm.com/book/en/v2/Git-Tools-Rerere

Gitz

I have a collection of git tools written in Python called https://github.com/rec/gitz, which automate much of the workflow but you don’t need them if you already have your own manner of working!

gitz has no dependencies so you can just download the directory and throw it into your PATH, or use pip to get them.

Lifecycle of a branch

The typical lifecycle of a branch looks this:

git new some-branch

              # Make commits

git update    # Stay up to date with HEAD

              # Make more commits

git go pull   # Start or visit a pull request

              # Use rebase, squash...

              # Request is pulled

git delete .  # Delete the current branch
git update

Recipes

To see or open a pull request on the current branch:

If you have gitz, use git go pull

Otherwise, if you have recently pushed on your fork,

open [https://github.com/SuperDuperDB/superduperdb/](https://github.com/SuperDuperDB/superduperdb/)

and there will be a prompt for a pull request there automatically.

At any time you can go to:

open https://github.com/<your-name>/superduperdb/branches

in your repository, and each branch has a button to go to its pull request.

To consolidate all your commits in your branch into one

Gitz dox

git-new

With no flags, git-new <branch-name> fetches upstream/main, and uses that commit ID to create a new branch, which it then pushes to origin

git-new -u uses the current commit, HEAD, instead of upstream/main.

git-new -d <other-user>/<branch> duplicates some other user’s branch into your fork, very useful for code reviews.

(git-new and the other tools will not overwrite existing changes.)

git-update

Tried to updates every branch from main

For each branch, try to git pull --rebase against main, and if successful git push --force-with-lease. If there is a merge conflict, this is reported, the branch is kept unchanged, and updating continues.

git-copy, git-delete, git-rename

Operations on branches.

These three commands do exactly what they say on the tin; each operates on both the local branch and the origin branch of the same name.

These commands all print the commit IDs before and after, so any mistake is easily fixable [TBD: tutorial on git reset --hard etc.]

git-go

Opens GitHub URLs in the browser.

Examples:

  • git go pull: Open the pull request for this page
  • git go issues: Open the issues
  • git go directory: Open the current subdirectory within the source tree
  • … many more, see git go --help or git go -h

Commands can be abbreviated down to as few as one letter, so git go p or g go i.

git-adjust, git-permute, git-split

Very adventurous commands that let you slice and dice commits into something nice, ask me if you are bored, not required.