aka the "Private to public Git repository synchronization tool"
This Git repository contains the source of the Public-Git-Sync tool.
The purpose of Public-Git-Sync is to replicate the history of a branch with a given name from a private repository into a branch with the same name in a public repository.
Public-Git-Sync works one branch at a time.
Public-Git-Sync relies on the assumption that content of directory private
in the root of the private repository must not be made public.
Public-Git-Sync strips any private content from commits, and removes any commit which contains only private content, before pushing them to the public repository.
The tool is made of 4 bash scripts to be called, in order:
initialize_branch_synchronization.sh
to be called only once per branch to initialize Public-Git-Sync for this branch (see below "initalization")sync_public_branch.sh
to be called each time to synchronize a branch (see below "synchronization")finish_sync_public_branch.sh
called automatically bysync_public_branch.sh
unless a manual operation is needed to complete synchronization (see below "cherry-pick conflicts")commit_sync_public_branch.sh
to be called aftersync_public_branch.sh
to commit synchronization to private and public remote repositories
All scripts must be run from the root directory of a clone of the private repository
The sonar-enterprise
repository uses a modified version of this script, which excludes the GitHub Action workflows folder (.github/workflows
).
It is available on the sonar-enterprise
branch.
- a private repository (eg. sonarsource/sonar-enterprise)
- a public repository (eg. sonarsource/sonarqube)
- a branch to synchronize in the private repository (eg. master, branch-7.9, ...)
bash
andgit
(configured to access both repositories)
Public-Git-Sync must be initialized for each branch (later refered to as [branch_name]
).
The initialize_branch_synchronization.sh
script will create:
- a branch
[branch_name]
to synchronize to in the public repository - a branch named
public_[branch_name]
in the private repository which is the public version of[branch_name]
in the private repository - a pair of initial synchronization Git references (these references are used to track previous synchronization points)
These elements are mandatory for sync_public_branch.sh
to run.
To do the initialization, you will need:
- the name of the branch to synchronize
- a pair of commits which you know are synchronized with each other (aka. a pair of "synchronized commits")
- when synchronizing master, it will likely be the first commit ever in master (assuming it has no content in
private
directory already) - when synchronizing another branch than master, assuming master is synchronized, it will be the synchronized commits of the fork point of that branch with master
Here is an example of initialization with branch branch-7.9
of the sonar-enterprise
repository:
git clone [email protected]:sonarsource/public-git-sync.git synchronization
git clone [email protected]:sonarsource/sonar-enterprise.git
cd sonar-enterprise
../synchronization/initialize_branch_synchronization.sh sq [email protected]:sonarsource/sonarqube.git branch-7.9 9d45cf3bd58bafc6acbfac447fc70a1b5fe2f050 0dc7f1ec3d08fd5cd39e23b35b236bbfa7ec8ae6
The synchronized commit of the private branch must (obviously) exist in the private repository. It has to belong to the synchronized branch in the private repository (this is not currently enforced) but it doesn't have to be the HEAD of it.
The synchronized commit of the public branch can, but doesn't have to, exist in the private repository.
If it exists in the private repository, initialize_branch_synchronization.sh
will be able to create the synchronized branch in the public repository for you. If it doesn't, then synchronized branch in the public repository must exist and this commit must be the HEAD of it.
Current implementation of initialize_branch_synchronization.sh
will fail if [branch_name]
or public_[branch_name]
exist and can't be updated to the specified commits (ie. it performs a default git push
).
sync_public_branch.sh
is the bash script you want to call on a regular basis, most likely through an automated task.
To run this script, you will need:
- the name of the branch to synchronize
- to have initialized Public-Git-Sync on this branch (see above "Initialization")
sync_public_branch.sh
does not make any change to the remote public and private repositories. All changes are local to the current clone of the private repository.
To "commit" these changes to the remote public and private repositories, you must call commit_sync_public_branch.sh
(or do it manually, not advised).
Here is an example of synchronization of branch branch-7.9
of the sonar-enterprise
repository:
git clone [email protected]:sonarsource/public-git-sync.git synchronization
git clone [email protected]:sonarsource/sonar-enterprise.git
cd sonar-enterprise
../synchronization/sync_public_branch.sh sq [email protected]:sonarsource/sonarqube.git branch-7.9
../synchronization/commit_sync_public_branch.sh sq branch-7.9
It may happen that sync_public_branch.sh
fails to perform automatically all operations to create the public version of [branch_name]
. Most specifically, these would be cherry-pick
commands which can't be made automatically by Git.
In such case, sync_public_branch.sh
will stop prematuraly with a message such as the following:
Resolve the current cherry-pick, and then run bash cherry-pick.branch-7.9.8325.sh to continue.
Finally, run 'finish_sync_public_branch.sh branch-7.9' to complete the recovery.
To investigate and resolve the conflict:
- use
git status
,git diff
, edit files as necessary and then usegit cherry-pick --continue
- execute the indicated script:
./cherry-pick.branch-7.9.8325.sh
which will apply the following cherry-pick commands - execute
finish_sync_public_branch.sh branch-7.9
as indicated to finalize the sync - finally, as when
sync_public_branch.sh
succeeds, you will have to callcommit_sync_public_branch.sh
to push synchronization results to public and private remote repositories
Under regular operation of Public-Git-Sync, you won't have to call the finish_sync_public_branch.sh
script.
This script only exists appart from sync_public_branch.sh
to provide an easy procedure to recover a cherry-pick conflicts (see above).
What Public-Git-Sync basically does is to replicate a tree of commits into another tree of commits (so, keeping the order) excluding all changes which apply to the private
directory.
To illustrate, take the following trees of two synchronized private and public branches:
A
,B
,C
, ... are the sha1 of the commits andchange A
,change B
,change C
, ... are the title of the commits
public branch private branch
A change A A' change A
| |
B change B B' change B
| |
C change C |
| |
D change D D' change D
| |
E change E E' change E
| |
F change F F' change F
Several observations:
- the title and the order of the commits are preserved (so are the authors, emails, etc.)
- there is no commit
C'
because this commit contained only private content - any of the commits
B'
toF'
could have the same content as respectively,B
toF
, they will still have a different sha1 because their parent is not the same as in the public branch
In the example above, each commit has a single parent. This is not the case for merge commits.
Public-Git-Sync handles merge commits by taking only one parent into account. This implies that merge commit will be there, unless it's empty/has only private content, and will lose the information of the other parent.
Public-Git-Sync relies on Git commands filter-branch
and cherry-pick
and a pair of work branches (called [branch_name]_work
and public_[branch_name]_work
).
The sync_public_branch.sh
script does the following (on top of some sanity checks):
- (re)create branch
[branch_name]_work
as a copy of[branch_name]
- use
git filter-branch
to remove any private content from the last synchronized commit to the HEAD of[branch_name]_work
- this creates commits without private content
- and also some empty commits which had only private content
- (re)create branch
public_[branch_name]_work
frompublic_[branch_name]
- apply in order, with
git cherry-pick
, the filtered commits from[branch_name]_work
intopublic_[branch_name]
- these are the operations which may require manual fix by user to complete
- this branch can contain empty commits at this stage
The finish_sync_public_branch.sh
is then called and does the following:
- remove empty commits from
public_[branch_name]_work
usinggit filter-branch
- in the example above,
C'
did exist for a while but is deleted at this stage
- in the example above,
- update local branch
public_[branch_name]
to now be the same aspublic_[branch_name]_work
- perform a sanity check to ensure no private content are present in
public_[branch_name]
- create synchronization refs to the HEADs of
[branch_name]
andpublic_[branch_name]
So far, all changes made are local to the current clone of the private repository.
They can be reviewed or discarded by simply deleting the clone.
To push the changes to the remote public and private repositories, use commit_sync_public_branch.sh
which does the following:
- push branch
public_[branch_name]
to private repository - push branch
[branch_name]
to public repository with content ofpublic_[branch_name]
- push the synchonization refs
commit_sync_public_branch.sh
will have no effect if there is nothing to push. It can be safely called after a call of sync_public_branch.sh
which found nothing to synchronize.
commit_sync_public_branch.sh
will push all synchronization refs not yet present in remote private repository, not only those created by the previous synchronization (ie. it's stateless). This implies it can be used to push at once multiple synchronizations (this is not advised, though).
The second important feature of Public-Git-Sync is to keep track of synchronized commits.
This way, sync_public_branch.sh
can:
- synchronize only the new commits from the private branch
- detect there is nothing new to synchronize
- prevent synchronisation time to increase (linearily or worse) with the number of commits in
[branch_name]
- prevent possible rewriting (new sha1) of already synchronized commits and a lot of messy consequences that would have
- rely on safe and simple
git push --forward-only
to publish to the public repository
Public-Git-Sync achieves this by storing Git references (later referenced to as "synchronization refs") in the private repository. These references are stored in refs/public_sync
.
Synchronization refs always go by two: one pointing to a commit in the private branch, one pointing to a commit in the public branch.
These two commits are referred to as "synchronized commits". You can consider synchronized commits as one commit being the public version of the other.
Let's have a look at a sequence of synchronization calls over time for the example above.
Initialization
public branch private branch synchronized commits
A change A A' change A A -> A'
Synchronizing B
public branch private branch synchronized commits
A change A A' change A A -> A'
| |
B change B B' change B B -> B'
Synchronizing C
since
C
has no public content, no new commit will be pushed to the public branch. The synchronized commit forC
is therefor the current HEAD of the public branch:B'
public branch private branch synchronized commits
A change A A' change A A -> A'
| |
B change B B' change B B -> B'
| |
C change C | C -> B'
Synchronizing F
since the last synchronized commit is
C
, all commits sinceC
will be synchronized:D
,E
andF
. Public-Git-Sync will create syncronization refs only for the HEADs:F
andF'
.
public branch private branch synchronized commits
A change A A' change A A -> A'
| |
B change B B' change B B -> B'
| |
C change C | C -> B'
| |
D change D D' change D
| |
E change E E' change E
| |
F change F F' change F F -> F'
Synchronization refs are stored in the ref directory refs/public_sync
.
Refs to synchronized commits are stored in a sudirectory which name is a timestamp, eg.: 2019-10-16_13-41-03
. Timestamp allows to sort references and identify the most recent ones. The timestamp is basically the time when sync_public_branch.sh
is called.
In this directory, each synchronized commit has its reference named after the branch it belongs to.
Currently, synchronization refs are stored forever. An improvement issue (#6) has been filled.
Git clone does not pull synchronization refs by default. They have to be pulled explicitly.
Example:
git fetch --no-tags origin "+refs/public_sync/*:refs/public_sync/*"
Github does not copy synchronization refs when forking a repository. They will have to be copied manually.
Commit should be pushed at first on:
master
if applicablesonar-enterprise
if specific
Then, if master
has new commits, a "GitHub rebase and merge" should be performed from master
to sonar-enterprise
.
Starting from this situation, when working with protected branches
* M3 - (master) New commits on master
* M2
* M1
| * S3 - (tag: v0.2.0, sonar-enterprise) Custom commits on sonar-enterprise
| * S2
| * S1
|/
* 00 - Common commits
*
Then, perform a GitHub "Rebase and Merge", that is actually a complete cherry-pick
* M3' - (sonar-enterprise) New commits from master
* M2'
* M1'
* S3 - (tag: v0.2.0) Custom commits on sonar-enterprise
* S2
* S1
| * M3 - (master) New commits on master
| * M2
| * M1
|/
* 00 - Common commits
*
Copyright 2018-2020.
Licensed under the GNU Lesser General Public License, Version 3.0
Sébastien Lesaint and Janos Gyerik