Skip to content

Commit ccb73a9

Browse files
committed
Discuss new meta- and sub-repos in user guide
addresses #202
1 parent 3ef40e0 commit ccb73a9

File tree

2 files changed

+133
-66
lines changed

2 files changed

+133
-66
lines changed

doc/architecture.md

+111-66
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,12 @@ space. In short, the first section should explain why git-meta is needed.
3939

4040
Next, we present the architecture for implementing a mono-repo using Git
4141
submodules. We describe the overall repository structure, commits, forking,
42-
refs, and the client-side representation.
42+
refs, client-side representation, and a recommended server-side configuration.
4343

44-
Then, we present an earlier, more-intuitive architecture. Seeing how our
45-
strategy evolved from this approach is illustrative, and helps to understand
46-
some of the less-intuitive design choices.
44+
Then, we discuss how our current design evolved from a seemingly simple goal of
45+
making submodules easier to use into the current architecture. Seeing how our
46+
strategy developed from a naive approach is illustrative, and helps to
47+
understand some of the less-intuitive design choices.
4748

4849
Next, we provide an analysis of the performance of a mono-repo. We show how
4950
the performance of a mono-repo can remain mostly constant as it grows, ages,
@@ -252,52 +253,42 @@ $ ls b
252253
README.md
253254
```
254255

255-
## Forking
256-
257-
It is possible to use git-meta with a single meta-repo namespace, but we
258-
strongly recommend the use of a name-partitioning strategy, a.k.a. forking.
259-
Forking may be generally be implemented either via [Git
260-
namespaces](https://git-scm.com/docs/gitnamespaces) or a
261-
hosting-solution-specific forking mechanism. Without forking, every user will
262-
receive every branch in existence on every fetch and clone, causing significant
263-
performance problems, especially over time.
264-
265-
We fork only the meta-repo. That is, for a given mono-repo, there may be any
266-
number of peer forks of the meta-repo on the back-end (though policy will
267-
generally designate that some meta-repos are special), but only one instance of
268-
each sub-repo:
269-
270-
```
271-
'-----------` '-----------`
272-
| a | | b |
273-
`-----------, `-----------,
274-
^ ^ ^ ^
275-
| | / |
276-
| | / |
277-
| `--.---. |
278-
| .--/ | |
279-
'--|-----|--` '--|-----|--`
280-
| a b | | a b |
281-
| - - - - - | | - - - - - |
282-
| jill/meta | | bill/meta |
283-
`-----------, `-----------,
284-
```
285-
286-
Any clone of any meta-repo (even a local one) will still reference the same
287-
canonical sub-repos. Thus, a mono-repo is not truly distributed like single
288-
Git repositories. We consider this to be acceptable for the following reasons:
289-
290-
- Mono-repos are designed to facilitate source management in large
291-
organizations, which generally nominate canonical repositories for
292-
integration anyway.
293-
- The individual repos of which they are composed are still normal,
294-
distributed, Git repos (e.g., two distinct mono-repos may contain sub-repos
295-
with the same histories).
296-
- As described in the section on our "naive architecture", workflows
297-
involving forked sub-repos have significant drawbacks.
298-
- One of the main benefits of DVCSs -- the ability to have a first-class
299-
development experience without network connectivity to the server -- is still
300-
possible.
256+
## Server-side Representation
257+
258+
We use the *omega* repo strategy to organize sub-repos. With this strategy,
259+
all submodules have the same URL: ".". When a submodule is opened, Git
260+
resolves the URL of the submodule to be exactly that of the origin of the
261+
meta-repo itself, i.e., the objects for the meta-repo and all sub-repos live in
262+
the same physical repository on the back-end.
263+
264+
Thus, we have a true (as in all commits reside in one repository) mono-repo
265+
that can be treated as many sub-repos (especially, client-side). Key
266+
functionality enabled by this technique:
267+
268+
- Server-side forks are possible; other potential strategies, as described in
269+
the design and evolution section, preclude forks.
270+
- When desired (e.g. preparing to work remotely) the entire mono-repo can be
271+
cloned relatively quickly -- with a single fetch -- compared to other
272+
techniques which would require a fetch for each sub-repo.
273+
- A commit is sufficient to describe the state of a sub-repo in a repository.
274+
A separate repository does not need to be brought to life to back a new
275+
sub-repo. If a sub-repo is created, e.g., on a local branch, there is no
276+
side-effects or impact on other (local or remote) branches.
277+
- Sub-repo creation and deletion are implemented in terms of normal,
278+
client-side, git operations.
279+
280+
Creating and pushing a new sub-repo:
281+
282+
```bash
283+
$ git meta new my/sub/repo
284+
Created new sub-repo my/sub/repo. It is currently empty. Please
285+
stage changes and/or make a commit before finishing with 'git meta commit';
286+
you will not be able to use 'git meta commit' until you do so.
287+
$ git touch my/sub/repo/README.md
288+
$ git meta add .
289+
$ git meta commit -m 'added my/sub/repo'
290+
$ git meta push
291+
```
301292
302293
## Refs
303294
@@ -313,8 +304,8 @@ old version of Git.
313304
In git-meta, branches and tags are applied only to meta-repos. Because each
314305
commit in a meta-repo unambiguously describes the state of all sub-repos, it is
315306
unnecessary to apply branches and tags to sup-repos. Furthermore, as described
316-
in the "Naive Architecture" section below, schemes relying on sub-repo branches
317-
proved to be impractical.
307+
in the "Design and Evolution" section below, schemes relying on sub-repo
308+
branches proved to be impractical.
318309
319310
Therefore, ref names in git-meta always refer to refs in the meta-repo.
320311
Branches and tags in sub-repos are ignored by git-meta -- by server-side checks
@@ -539,13 +530,15 @@ by the new synthetic-meta-ref if it does not already contain that commit in its
539530
history. The downside of this approach is that the mega-ref references all
540531
commits, probably many more than what is needed at any given time.
541532
542-
# Naive Architecture
533+
# Design and Evolution
543534
544-
In this section we describe the basic model we had in mind when we started
545-
git-meta. First, we provide an overview of the original architecture. Then,
546-
we describe a series of problems that arise from this architecture. Finally,
547-
we draw conclusions by this exercise and connect them to our final design
548-
choices.
535+
In this section we show how the architecture of git-meta evolved, in order to
536+
better explain our current design. First, we provide an overview of the
537+
original, "naive" architecture that seemed to make sense, but was actually
538+
unworkable. The, we describe a series of problems that arise from this
539+
architecture, leading through several intermediate solutions. Next, we
540+
describe our first working architecture and its failings. Finally, we
541+
highlight some key points that inform our current solution.
549542
550543
## Overview
551544
@@ -902,6 +895,45 @@ Now even `git meta open` will be unable to initialize the submodule `a` because
902895
Bob's `master` branch references a commit in it that cannot be found; we no
903896
longer have any knowledge that Jill's fork exists.
904897
898+
## A Workable Solution: namespaces and relative submodule URLs
899+
900+
Our first workable solution had the following characteristics:
901+
902+
1. Each submodule would have a relative URL. When opening a sub-repo, Git
903+
resolves this against the URL of the remote named `origin` to derive an
904+
absolute URL. For example, given an origin url of
905+
`http://git.example.com/meta` and a relative submodule URL
906+
`./a/b/c`, git would derive the absolute url for `a/b/c` to be:
907+
`http://git.example.com/meta/a/b/c`.
908+
1. Because forking is impractical with this scheme, we would use
909+
[Git namespaces](https://git-scm.com/docs/gitnamespaces) to create
910+
partition reference names.
911+
1. Sub-repo creation and deletion would be done through separate scripts
912+
that manipulated the local clone and communicated to the back-end in
913+
a hosting-solution-specific protocol.
914+
915+
This design has one several drawbacks:
916+
917+
1. The only major Git hosting solution that allows repositories to have `/`
918+
characters in their names (and hence, URLs), is Gitolite. Solving this
919+
problem for use with other hosting solutions (e.g., Gitlab or Github) is
920+
non-obvious.
921+
1. As mentioned above, true forking is not possible. Using Git namespaces
922+
instead poses problems:
923+
- Forks are common and widely understood; Git namespaces are not.
924+
- Hosting solutions provide for customizations and administration in forks,
925+
we would need to synthesize similar functionality around namespaces.
926+
- Our collaboration strategy was simplistic: users could push only to their
927+
own namespaces. To exchange code, Jill would pull changes from Bob's
928+
branch, and vice-versa. This approach works for collaborations between
929+
pairs of users, but becomes more clunky as the number of collaborators
930+
increases. We did not provide spaces for adhoc or org-structure-based
931+
groups where shard branches could be created. Such spaces and their
932+
branches are necessary for larger collaborations and release processes.
933+
Solving this problem would have likely required a hand-rolled solution.
934+
1. The use of hosting-solution-specific interfaces to create sub-repos is
935+
sub-optimal.
936+
905937
## Conclusions
906938
907939
1. It is not generally possible to synchronize or validate updates to a ref
@@ -911,16 +943,14 @@ longer have any knowledge that Jill's fork exists.
911943
1. We use symbolic-meta-refs as push-targets in sub-repos; as the contents of a
912944
symbolic-meta-ref are immutable (a given symbolic-meta-ref can point to only
913945
one commit), we are always guaranteed to be able to update them when needed.
914-
1. Forking sub-repos is potentially expensive, and probably impractical on the
915-
server-side, and creates many complications on the client-side. Therefore,
916-
we allow forking of only meta-repos; each sub-repo has a single namespace
917-
for the entire mono-repo.
918-
1. However, since git-meta does not push branch or tag names to sub-repos, and
919-
synthetic-meta-branches are plain refs that must be explicitly fetched,
920-
there will not be a problem with name-explosion in sub-repos.
946+
1. Using the omega repo strategy to store sub-repo refs together with meta-repo
947+
refs allows us to use forking and to implement submodule creation and
948+
deletion with normal Git operations.
921949
922950
# Performance
923951
952+
## Client-side
953+
924954
At a minimum, users working in a mono-repo must download the meta-repo and all
925955
sub-repos containing code that they require to work.
926956
@@ -951,14 +981,29 @@ minimized through several strategies:
951981
git-meta, we are developing a proposal to address this case and will link to
952982
it here when ready.
953983
984+
## Server-side
985+
986+
We were initially concerned about the effect of putting large numbers of refs
987+
(i.e., one or more synthetic-meta-refs per sub-repo) and objects into a single
988+
(back end) repository would have on the performance of client-server
989+
interactions, particularly fetching (including cloning) and pushing.
990+
991+
Testing on [an extremely large repository](https://github.com/bpeabody/mongo)
992+
(~260k commits and 26k sub-repos) has been encouraging so far. Still, we
993+
believe that keeping the total number of refs in a given repository to a small
994+
factor (close to 1) of the number of sub-repos is required to maintain
995+
performance. Synthetic-meta-refs should be pruned regularly (s.t. only
996+
necessary roots remain) and forks should be used to minimize the number of
997+
meta-repo branches in a given repository; this practice is a good one anyway.
998+
954999
# Tools
9551000
9561001
We provide three types of tools:
9571002
9581003
1. the `git-meta` plugin to simplify client-side operations such as
9591004
cross-repository merges
9601005
2. push validation tools to preserve mono-repo invariants
961-
3. maintenance scripts
1006+
3. maintenance scripts, e.g. to minimize the number of meta-refs
9621007
9631008
## The git-meta plugin
9641009

doc/user-guide.md

+22
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,11 @@ $ git meta commit -m "changed my-sub-repo"
282282

283283
## Usage Scenarios
284284

285+
### Creating a meta-repository
286+
287+
A meta-repository doesn't need any special configuration; any Git repository
288+
can get a meta-repository.
289+
285290
### Cloning
286291

287292
We do not provide a git-meta command for cloning as the built-in Git command
@@ -291,6 +296,23 @@ does exactly the right thing:
291296
$ git clone http://example.com/your-meta-repo.git meta
292297
```
293298

299+
### Creating a new sub-repository
300+
301+
Assuming that you are using the omega repository strategy described in the
302+
[architecture](./architecture.md) document, making a new sub-repo is
303+
straightforward:
304+
305+
```bash
306+
$ cd meta
307+
$ git meta new foo/bar
308+
Created new sub-repo foo/bar. It is currently empty. Please
309+
stage changes and/or make a commit before finishing with 'git meta commit';
310+
you will not be able to use 'git meta commit' until you do so.
311+
$ touch foo/bar/README.md
312+
$ git meta add .
313+
$ git meta commit -m "added foo/bar"
314+
```
315+
294316
### Submodule Visibility
295317
296318
A freshly-cloned meta-repo is usually empty, containing a tree of empty

0 commit comments

Comments
 (0)