@@ -39,11 +39,12 @@ space. In short, the first section should explain why git-meta is needed.
39
39
40
40
Next, we present the architecture for implementing a mono-repo using Git
41
41
submodules. We describe the overall repository structure, commits, forking,
42
- refs, and the client -side representation .
42
+ refs, client-side representation, and a recommended server -side configuration .
43
43
44
- Then, we present an earlier, more-intuitive architecture. Seeing how our
45
- strategy evolved from this approach is illustrative, and helps to understand
46
- some of the less-intuitive design choices.
44
+ Then, we discuss how our current design evolved from a seemingly simple goal of
45
+ making submodules easier to use into the current architecture. Seeing how our
46
+ strategy developed from a naive approach is illustrative, and helps to
47
+ understand some of the less-intuitive design choices.
47
48
48
49
Next, we provide an analysis of the performance of a mono-repo. We show how
49
50
the performance of a mono-repo can remain mostly constant as it grows, ages,
@@ -252,52 +253,42 @@ $ ls b
252
253
README.md
253
254
```
254
255
255
- ## Forking
256
-
257
- It is possible to use git-meta with a single meta-repo namespace, but we
258
- strongly recommend the use of a name-partitioning strategy, a.k.a. forking.
259
- Forking may be generally be implemented either via [ Git
260
- namespaces] ( https://git-scm.com/docs/gitnamespaces ) or a
261
- hosting-solution-specific forking mechanism. Without forking, every user will
262
- receive every branch in existence on every fetch and clone, causing significant
263
- performance problems, especially over time.
264
-
265
- We fork only the meta-repo. That is, for a given mono-repo, there may be any
266
- number of peer forks of the meta-repo on the back-end (though policy will
267
- generally designate that some meta-repos are special), but only one instance of
268
- each sub-repo:
269
-
270
- ```
271
- '-----------` '-----------`
272
- | a | | b |
273
- `-----------, `-----------,
274
- ^ ^ ^ ^
275
- | | / |
276
- | | / |
277
- | `--.---. |
278
- | .--/ | |
279
- '--|-----|--` '--|-----|--`
280
- | a b | | a b |
281
- | - - - - - | | - - - - - |
282
- | jill/meta | | bill/meta |
283
- `-----------, `-----------,
284
- ```
285
-
286
- Any clone of any meta-repo (even a local one) will still reference the same
287
- canonical sub-repos. Thus, a mono-repo is not truly distributed like single
288
- Git repositories. We consider this to be acceptable for the following reasons:
289
-
290
- - Mono-repos are designed to facilitate source management in large
291
- organizations, which generally nominate canonical repositories for
292
- integration anyway.
293
- - The individual repos of which they are composed are still normal,
294
- distributed, Git repos (e.g., two distinct mono-repos may contain sub-repos
295
- with the same histories).
296
- - As described in the section on our "naive architecture", workflows
297
- involving forked sub-repos have significant drawbacks.
298
- - One of the main benefits of DVCSs -- the ability to have a first-class
299
- development experience without network connectivity to the server -- is still
300
- possible.
256
+ ## Server-side Representation
257
+
258
+ We use the * omega* repo strategy to organize sub-repos. With this strategy,
259
+ all submodules have the same URL: ".". When a submodule is opened, Git
260
+ resolves the URL of the submodule to be exactly that of the origin of the
261
+ meta-repo itself, i.e., the objects for the meta-repo and all sub-repos live in
262
+ the same physical repository on the back-end.
263
+
264
+ Thus, we have a true (as in all commits reside in one repository) mono-repo
265
+ that can be treated as many sub-repos (especially, client-side). Key
266
+ functionality enabled by this technique:
267
+
268
+ - Server-side forks are possible; other potential strategies, as described in
269
+ the design and evolution section, preclude forks.
270
+ - When desired (e.g. preparing to work remotely) the entire mono-repo can be
271
+ cloned relatively quickly -- with a single fetch -- compared to other
272
+ techniques which would require a fetch for each sub-repo.
273
+ - A commit is sufficient to describe the state of a sub-repo in a repository.
274
+ A separate repository does not need to be brought to life to back a new
275
+ sub-repo. If a sub-repo is created, e.g., on a local branch, there is no
276
+ side-effects or impact on other (local or remote) branches.
277
+ - Sub-repo creation and deletion are implemented in terms of normal,
278
+ client-side, git operations.
279
+
280
+ Creating and pushing a new sub-repo:
281
+
282
+ ``` bash
283
+ $ git meta new my/sub/repo
284
+ Created new sub-repo my/sub/repo. It is currently empty. Please
285
+ stage changes and/or make a commit before finishing with ' git meta commit' ;
286
+ you will not be able to use ' git meta commit' until you do so.
287
+ $ git touch my/sub/repo/README.md
288
+ $ git meta add .
289
+ $ git meta commit -m ' added my/sub/repo'
290
+ $ git meta push
291
+ ` ` `
301
292
302
293
# # Refs
303
294
@@ -313,8 +304,8 @@ old version of Git.
313
304
In git-meta, branches and tags are applied only to meta-repos. Because each
314
305
commit in a meta-repo unambiguously describes the state of all sub-repos, it is
315
306
unnecessary to apply branches and tags to sup-repos. Furthermore, as described
316
- in the "Naive Architecture " section below, schemes relying on sub-repo branches
317
- proved to be impractical.
307
+ in the " Design and Evolution " section below, schemes relying on sub-repo
308
+ branches proved to be impractical.
318
309
319
310
Therefore, ref names in git-meta always refer to refs in the meta-repo.
320
311
Branches and tags in sub-repos are ignored by git-meta -- by server-side checks
@@ -539,13 +530,15 @@ by the new synthetic-meta-ref if it does not already contain that commit in its
539
530
history. The downside of this approach is that the mega-ref references all
540
531
commits, probably many more than what is needed at any given time.
541
532
542
- # Naive Architecture
533
+ # Design and Evolution
543
534
544
- In this section we describe the basic model we had in mind when we started
545
- git-meta. First, we provide an overview of the original architecture. Then,
546
- we describe a series of problems that arise from this architecture. Finally,
547
- we draw conclusions by this exercise and connect them to our final design
548
- choices.
535
+ In this section we show how the architecture of git-meta evolved, in order to
536
+ better explain our current design. First, we provide an overview of the
537
+ original, " naive" architecture that seemed to make sense, but was actually
538
+ unworkable. The, we describe a series of problems that arise from this
539
+ architecture, leading through several intermediate solutions. Next, we
540
+ describe our first working architecture and its failings. Finally, we
541
+ highlight some key points that inform our current solution.
549
542
550
543
# # Overview
551
544
@@ -902,6 +895,45 @@ Now even `git meta open` will be unable to initialize the submodule `a` because
902
895
Bob' s ` master` branch references a commit in it that cannot be found; we no
903
896
longer have any knowledge that Jill' s fork exists.
904
897
898
+ ## A Workable Solution: namespaces and relative submodule URLs
899
+
900
+ Our first workable solution had the following characteristics:
901
+
902
+ 1. Each submodule would have a relative URL. When opening a sub-repo, Git
903
+ resolves this against the URL of the remote named `origin` to derive an
904
+ absolute URL. For example, given an origin url of
905
+ `http://git.example.com/meta` and a relative submodule URL
906
+ `./a/b/c`, git would derive the absolute url for `a/b/c` to be:
907
+ `http://git.example.com/meta/a/b/c`.
908
+ 1. Because forking is impractical with this scheme, we would use
909
+ [Git namespaces](https://git-scm.com/docs/gitnamespaces) to create
910
+ partition reference names.
911
+ 1. Sub-repo creation and deletion would be done through separate scripts
912
+ that manipulated the local clone and communicated to the back-end in
913
+ a hosting-solution-specific protocol.
914
+
915
+ This design has one several drawbacks:
916
+
917
+ 1. The only major Git hosting solution that allows repositories to have `/`
918
+ characters in their names (and hence, URLs), is Gitolite. Solving this
919
+ problem for use with other hosting solutions (e.g., Gitlab or Github) is
920
+ non-obvious.
921
+ 1. As mentioned above, true forking is not possible. Using Git namespaces
922
+ instead poses problems:
923
+ - Forks are common and widely understood; Git namespaces are not.
924
+ - Hosting solutions provide for customizations and administration in forks,
925
+ we would need to synthesize similar functionality around namespaces.
926
+ - Our collaboration strategy was simplistic: users could push only to their
927
+ own namespaces. To exchange code, Jill would pull changes from Bob' s
928
+ branch, and vice-versa. This approach works for collaborations between
929
+ pairs of users, but becomes more clunky as the number of collaborators
930
+ increases. We did not provide spaces for adhoc or org-structure-based
931
+ groups where shard branches could be created. Such spaces and their
932
+ branches are necessary for larger collaborations and release processes.
933
+ Solving this problem would have likely required a hand-rolled solution.
934
+ 1. The use of hosting-solution-specific interfaces to create sub-repos is
935
+ sub-optimal.
936
+
905
937
# # Conclusions
906
938
907
939
1. It is not generally possible to synchronize or validate updates to a ref
@@ -911,16 +943,14 @@ longer have any knowledge that Jill's fork exists.
911
943
1. We use symbolic-meta-refs as push-targets in sub-repos; as the contents of a
912
944
symbolic-meta-ref are immutable (a given symbolic-meta-ref can point to only
913
945
one commit), we are always guaranteed to be able to update them when needed.
914
- 1 . Forking sub-repos is potentially expensive, and probably impractical on the
915
- server-side, and creates many complications on the client-side. Therefore,
916
- we allow forking of only meta-repos; each sub-repo has a single namespace
917
- for the entire mono-repo.
918
- 1 . However, since git-meta does not push branch or tag names to sub-repos, and
919
- synthetic-meta-branches are plain refs that must be explicitly fetched,
920
- there will not be a problem with name-explosion in sub-repos.
946
+ 1. Using the omega repo strategy to store sub-repo refs together with meta-repo
947
+ refs allows us to use forking and to implement submodule creation and
948
+ deletion with normal Git operations.
921
949
922
950
# Performance
923
951
952
+ # # Client-side
953
+
924
954
At a minimum, users working in a mono-repo must download the meta-repo and all
925
955
sub-repos containing code that they require to work.
926
956
@@ -951,14 +981,29 @@ minimized through several strategies:
951
981
git-meta, we are developing a proposal to address this case and will link to
952
982
it here when ready.
953
983
984
+ ## Server-side
985
+
986
+ We were initially concerned about the effect of putting large numbers of refs
987
+ (i.e., one or more synthetic-meta-refs per sub-repo) and objects into a single
988
+ (back end) repository would have on the performance of client-server
989
+ interactions, particularly fetching (including cloning) and pushing.
990
+
991
+ Testing on [an extremely large repository](https://github.com/bpeabody/mongo)
992
+ (~260k commits and 26k sub-repos) has been encouraging so far. Still, we
993
+ believe that keeping the total number of refs in a given repository to a small
994
+ factor (close to 1) of the number of sub-repos is required to maintain
995
+ performance. Synthetic-meta-refs should be pruned regularly (s.t. only
996
+ necessary roots remain) and forks should be used to minimize the number of
997
+ meta-repo branches in a given repository; this practice is a good one anyway.
998
+
954
999
# Tools
955
1000
956
1001
We provide three types of tools:
957
1002
958
1003
1. the `git-meta` plugin to simplify client-side operations such as
959
1004
cross-repository merges
960
1005
2. push validation tools to preserve mono-repo invariants
961
- 3 . maintenance scripts
1006
+ 3. maintenance scripts, e.g. to minimize the number of meta-refs
962
1007
963
1008
## The git-meta plugin
964
1009
0 commit comments