|
| 1 | +# File Deduplication Tool |
| 2 | + |
| 3 | +`dugo`, a fast and efficient command-line tool written in Go to find and remove duplicate files in a directory. It supports concurrency for improved performance compared to other tools. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Features |
| 8 | + |
| 9 | +- **Fast Duplicate Detection**: Uses file size and MD5 hashing to quickly identify potential duplicates. |
| 10 | +- **Accurate Comparison**: Performs byte-by-byte comparison to confirm duplicates, avoiding false positives due to hash collisions. |
| 11 | +- **Concurrency Support**: Leverages Go's goroutines to process files in parallel, speeding up the deduplication process. |
| 12 | +- **Interactive Deletion**: Optionally prompts the user to delete selected duplicate files interactively. |
| 13 | +- **Flexible Ignore Options**: Allows ignoring files or directories by name or regex pattern. |
| 14 | +- **Customizable Workers**: Lets you control the number of concurrent workers for optimal performance. |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## Installation |
| 19 | + |
| 20 | +### Prerequisites |
| 21 | +- Go (for building from source). |
| 22 | + |
| 23 | +### Build from Source |
| 24 | +1. Clone the repository: |
| 25 | + |
| 26 | +2. Build the tool: |
| 27 | + ```bash |
| 28 | + go build -o dogu |
| 29 | + ``` |
| 30 | +3. Move the binary to a directory in your `PATH` (optional): |
| 31 | + ```bash |
| 32 | + sudo mv dogu /usr/local/bin/ |
| 33 | + ``` |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Usage |
| 38 | + |
| 39 | +### Basic Usage |
| 40 | +To find duplicates in a directory: |
| 41 | +```bash |
| 42 | +./dugo /path/to/directory |
| 43 | +``` |
| 44 | + |
| 45 | +### Enable Interactive Deletion |
| 46 | +To interactively delete duplicates: |
| 47 | +```bash |
| 48 | +./dugo -delete /path/to/directory |
| 49 | +``` |
| 50 | + |
| 51 | +### Ignore Files or Directories |
| 52 | +- Ignore specific files or directories by name: |
| 53 | + ```bash |
| 54 | + ./dugo -ignore-names=".git,temp,backup" /path/to/directory |
| 55 | + ``` |
| 56 | +- Ignore files or directories using a regex pattern: |
| 57 | + ```bash |
| 58 | + ./dugo -ignore-regex=".*\.tmp$" /path/to/directory |
| 59 | + ``` |
| 60 | + |
| 61 | +### Control Concurrency |
| 62 | +Set the number of concurrent workers (default: 4): |
| 63 | +```bash |
| 64 | +./dugo -workers=8 /path/to/directory |
| 65 | +``` |
| 66 | + |
| 67 | +### Full Example |
| 68 | +Find duplicates, ignore `.tmp` files, and use 8 workers: |
| 69 | +```bash |
| 70 | +./dugo -ignore-regex=".*\.tmp$" -workers=8 /path/to/directory |
| 71 | +``` |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## Options |
| 76 | + |
| 77 | +| Flag | Description | |
| 78 | +|-----------------|-----------------------------------------------------------------------------| |
| 79 | +| `-ignore-names` | Comma-separated list of file/directory names to ignore (exact match). | |
| 80 | +| `-ignore-regex` | Regex pattern to ignore files/directories by path. | |
| 81 | +| `-workers` | Number of concurrent workers (default: 4). | |
| 82 | +| `-delete` | Enable interactive deletion of duplicate files. | |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +## How It Works |
| 87 | + |
| 88 | +1. **Scan Directory**: The tool scans the specified directory and groups files by size. |
| 89 | +2. **Hash Files**: Files with the same size are hashed using MD5. |
| 90 | +3. **Compare Files**: Files with the same hash are compared byte-by-byte to confirm duplicates. |
| 91 | +4. **Report or Delete**: Duplicates are either reported to the user or deleted interactively. |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +## Example Output |
| 96 | + |
| 97 | +### Without Deletion |
| 98 | +``` |
| 99 | +Equal files: [/path/to/file1.txt /path/to/file2.txt] |
| 100 | +Equal files: [/path/to/image1.png /path/to/image2.png] |
| 101 | +``` |
| 102 | + |
| 103 | +### With Deletion |
| 104 | +``` |
| 105 | +Duplicate group (2 files): |
| 106 | +[1] /path/to/file1.txt |
| 107 | +[2] /path/to/file2.txt |
| 108 | +
|
| 109 | +Enter numbers to delete (space-separated, 'a' to abort): 1 |
| 110 | +Deleted: /path/to/file1.txt |
| 111 | +``` |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +## Contributing |
| 116 | + |
| 117 | +Contributions are welcome! Here’s how you can help: |
| 118 | +1. Fork the repository. |
| 119 | +2. Create a new branch for your feature or bugfix. |
| 120 | +3. Submit a pull request with a detailed description of your changes. |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## License |
| 125 | + |
| 126 | +This project is licensed under the MIT License. See the [LICENSE](https://github.com/knbr13/dugo/blob/main/LICENSE) file for details. |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +## Acknowledgments |
| 131 | + |
| 132 | +- Inspired by the need for a fast and accurate file deduplication tool. |
| 133 | +- Built with Go’s powerful concurrency model for high performance. |
0 commit comments