-
-
Notifications
You must be signed in to change notification settings - Fork 13
feat: add large-file-script. #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 20 commits
0da9b06
268e0bd
fea6565
42f2b3e
1229cfb
e53e0cb
4113f8f
e564c47
d5c04a3
4c6db2f
dc7373b
97c74bf
c9e29ce
4e8e66f
d65f6d2
a3be28b
9b5f72c
71c247d
18acd2c
446cb54
8f3cab6
b66c0b0
9baab2a
f585fd5
4811f24
51b7259
a09b445
95a6e85
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| #!/bin/bash | ||
|
|
||
| set -e | ||
|
|
||
| # Function to display help message | ||
| usage() { | ||
| cat <<EOF | ||
| Usage: large-files [N] | ||
|
|
||
| Find the top N largest files in a Git repository. | ||
|
|
||
| Arguments: | ||
| N Number of large files to display (default: 20) | ||
|
|
||
| Description: | ||
| This script scans the entire Git history and identifies the largest files that | ||
| have ever existed in the repository. It uses 'git rev-list' to extract object | ||
| sizes and sorts them to display the largest files. | ||
|
|
||
| Requirements: | ||
| - Ensure you're inside a Git repository. | ||
| - Install 'numfmt' (available in GNU coreutils) for better size formatting. | ||
|
|
||
| Examples: | ||
| large-files # Show top 20 largest files | ||
| large-files 50 # Show top 50 largest files | ||
| EOF | ||
| exit 0 | ||
|
||
| } | ||
|
|
||
| # Default number of files to display | ||
| NUM_FILES=20 | ||
|
|
||
| # Parse command-line argument | ||
| if [[ $# -gt 1 ]]; then | ||
| usage | ||
| elif [[ $# -eq 1 ]]; then | ||
| NUM_FILES="$1" | ||
| if ! [[ "$NUM_FILES" =~ ^[0-9]+$ ]]; then | ||
| echo "Error: N must be a valid integer." | ||
| usage | ||
| fi | ||
| fi | ||
|
|
||
| # Ensure we are inside a Git repository | ||
| if ! git rev-parse --is-inside-work-tree >/dev/null 2>&1; then | ||
| echo "Error: Not inside a Git repository." | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "🔍 Finding the $NUM_FILES largest files in the repository..." | ||
|
|
||
| # Extract large files from Git history | ||
| git rev-list --objects --all | | ||
| git cat-file --batch-check='%(objectsize:disk) %(rest)' | | ||
| sort -rh | | ||
| head -n "$NUM_FILES" | | ||
| awk '{ printf "%10s %s\n", $1, $2 }' | | ||
|
||
| numfmt --to=iec-i --suffix=B --padding=7 --field=1 | ||
|
|
||
| echo "✅ Done!" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering all files that have ever existed significantly diminishes the utility of this tool. Consider the following output of refs to the same file:
(Side note: If we are going to claim 'to have ever existed', we should probably do an explicit
fetchin the beginning.)I can think of a couple of ways to make this script more useful:
@yuting1214, what do you think? Do any of the above (alone or in conjunction) sound more or less appealing to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Michael,
I'm more than happy to contribute more about this repo. It's just I'm a little occupied at the moment.
Once I settled a few things, I'll come back and write more code with you. Stay tuned.