Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repository is large (200MB) #10

Open
eliasnaur opened this issue Apr 28, 2022 · 7 comments · Fixed by #12
Open

repository is large (200MB) #10

eliasnaur opened this issue Apr 28, 2022 · 7 comments · Fixed by #12

Comments

@eliasnaur
Copy link
Contributor

I couldn't help notice that downloading github.com/benoitkugler/textlayout takes quite a while, so I ran

$ du -hs .
311M	.
$ du -hs *
155M	fonts
4.0K	go.mod
4.0K	go.sum
19M	graphite
12M	harfbuzz
440K	language
4.0K	LICENSE
4.0K	README.md
4.0K	test
5.2M	unicodedata

to check.

It's unfortunate to have to fetch at least ~200 MB data, or even > 300 MB for the entire history just to access the Go source. Would you be open to slimming down the repository and rewrite Git history to obtain a leaner dependency? If it's inconvenient to slim down the testdata files, perhaps they could be extracted into a separate (test-only) dependency module?

@benoitkugler
Copy link
Owner

benoitkugler commented Apr 28, 2022

I've no argument against rewriting Git history, but I'm not proficient at all in this exercice !

The weight of the module is indeed coming from test font files. I would prefer not to reduce test coverage, but if they can be extracted in a test-only dependency, let's do it. What would be the way to proceed ? Do go modules have a way to specify test-only deps ?

@eliasnaur
Copy link
Contributor Author

Thanks. I'll take a stab at it if no-one else beats me to it.

Do go modules have a way to specify test-only deps ?

Not that I know of, but I'm hoping that non-test builds can avoid downloading the test-module.

@eliasnaur
Copy link
Contributor Author

eliasnaur commented Apr 29, 2022

PR #12 changes the tests to use an external module for their data. For the git history rewrite, I came up with:

$ git filter-branch --force --index-filter   'git rm -r --cached --ignore-unmatch fonts/type1C/test graphite/testdata harfbuzz/testdata font/truetype/testdata fonts/type1/type1.test' -- <BRANCH>

from https://www.deployhq.com/git/faqs/removing-large-files-from-git-history.

Note that only BRANCH is rewritten, which is intended; you want to keep your existing tags pointing at the old data. I believe git clone by default pulls data from tags as well, so to reap the size gains the existing tags will have to be deleted at some point.

@jonegil
Copy link

jonegil commented May 2, 2022

Although most likely unrelated, it might be worth mentioning that the size of the downloaded zip differs by ca 80MB across three different computers right now. Needless to say, there are issues on the machine that only receives 53 MB instead of 130MB

Although a github problem I assume, it might be solved by slimming the repo.

@benoitkugler
Copy link
Owner

PR #12 has indeed nicely slimmed down the repo.
However, I'm not seing a large benefit (in size) by running git filter-branch as you hinted : I'm still at 121M for the .git directory...

@eliasnaur
Copy link
Contributor Author

eliasnaur commented May 7, 2022

However, I'm not seing a large benefit (in size) by running git filter-branch as you hinted : I'm still at 121M for the .git directory...

Did you run git gc? If that doesn't help, I believe it's because you still have other branches or tags referring to commits including the test data. If you replace <BRANCH> with --all in the filter-branch command (perhaps followed by a git gc), the .git directory should slim as well.

However, the reason I suggested a branch instead of --all is because I assume you want to preserve the already released tags for a while. If you change the content of the existing tags, Go will complain about module checksum mismatches for direct module fetches.

In summary, I suggest running filter-branch on your main branch to remove testdata and the stray test binary, and force-push that. Then, after a while, delete old branches and release tags that refer to the old history, leaving only release tags that refer to the new history.

whereswaldon pushed a commit to gioui/gio that referenced this issue May 7, 2022
The v0.1.1 release is much smaller because the module no longer contains
test data. See

benoitkugler/textlayout#10

Signed-off-by: Elias Naur <[email protected]>
@Jacalz
Copy link
Contributor

Jacalz commented Dec 9, 2022

The repository seems to now only be 17MB. I think this can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants