Skip to content

Commit cd4cfbd

Browse files
committed
add README.md file
1 parent d152c19 commit cd4cfbd

File tree

1 file changed

+133
-0
lines changed

1 file changed

+133
-0
lines changed

README.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# File Deduplication Tool
2+
3+
`dugo`, a fast and efficient command-line tool written in Go to find and remove duplicate files in a directory. It supports concurrency for improved performance compared to other tools.
4+
5+
---
6+
7+
## Features
8+
9+
- **Fast Duplicate Detection**: Uses file size and MD5 hashing to quickly identify potential duplicates.
10+
- **Accurate Comparison**: Performs byte-by-byte comparison to confirm duplicates, avoiding false positives due to hash collisions.
11+
- **Concurrency Support**: Leverages Go's goroutines to process files in parallel, speeding up the deduplication process.
12+
- **Interactive Deletion**: Optionally prompts the user to delete selected duplicate files interactively.
13+
- **Flexible Ignore Options**: Allows ignoring files or directories by name or regex pattern.
14+
- **Customizable Workers**: Lets you control the number of concurrent workers for optimal performance.
15+
16+
---
17+
18+
## Installation
19+
20+
### Prerequisites
21+
- Go (for building from source).
22+
23+
### Build from Source
24+
1. Clone the repository:
25+
26+
2. Build the tool:
27+
```bash
28+
go build -o dogu
29+
```
30+
3. Move the binary to a directory in your `PATH` (optional):
31+
```bash
32+
sudo mv dogu /usr/local/bin/
33+
```
34+
35+
---
36+
37+
## Usage
38+
39+
### Basic Usage
40+
To find duplicates in a directory:
41+
```bash
42+
./dugo /path/to/directory
43+
```
44+
45+
### Enable Interactive Deletion
46+
To interactively delete duplicates:
47+
```bash
48+
./dugo -delete /path/to/directory
49+
```
50+
51+
### Ignore Files or Directories
52+
- Ignore specific files or directories by name:
53+
```bash
54+
./dugo -ignore-names=".git,temp,backup" /path/to/directory
55+
```
56+
- Ignore files or directories using a regex pattern:
57+
```bash
58+
./dugo -ignore-regex=".*\.tmp$" /path/to/directory
59+
```
60+
61+
### Control Concurrency
62+
Set the number of concurrent workers (default: 4):
63+
```bash
64+
./dugo -workers=8 /path/to/directory
65+
```
66+
67+
### Full Example
68+
Find duplicates, ignore `.tmp` files, and use 8 workers:
69+
```bash
70+
./dugo -ignore-regex=".*\.tmp$" -workers=8 /path/to/directory
71+
```
72+
73+
---
74+
75+
## Options
76+
77+
| Flag | Description |
78+
|-----------------|-----------------------------------------------------------------------------|
79+
| `-ignore-names` | Comma-separated list of file/directory names to ignore (exact match). |
80+
| `-ignore-regex` | Regex pattern to ignore files/directories by path. |
81+
| `-workers` | Number of concurrent workers (default: 4). |
82+
| `-delete` | Enable interactive deletion of duplicate files. |
83+
84+
---
85+
86+
## How It Works
87+
88+
1. **Scan Directory**: The tool scans the specified directory and groups files by size.
89+
2. **Hash Files**: Files with the same size are hashed using MD5.
90+
3. **Compare Files**: Files with the same hash are compared byte-by-byte to confirm duplicates.
91+
4. **Report or Delete**: Duplicates are either reported to the user or deleted interactively.
92+
93+
---
94+
95+
## Example Output
96+
97+
### Without Deletion
98+
```
99+
Equal files: [/path/to/file1.txt /path/to/file2.txt]
100+
Equal files: [/path/to/image1.png /path/to/image2.png]
101+
```
102+
103+
### With Deletion
104+
```
105+
Duplicate group (2 files):
106+
[1] /path/to/file1.txt
107+
[2] /path/to/file2.txt
108+
109+
Enter numbers to delete (space-separated, 'a' to abort): 1
110+
Deleted: /path/to/file1.txt
111+
```
112+
113+
---
114+
115+
## Contributing
116+
117+
Contributions are welcome! Here’s how you can help:
118+
1. Fork the repository.
119+
2. Create a new branch for your feature or bugfix.
120+
3. Submit a pull request with a detailed description of your changes.
121+
122+
---
123+
124+
## License
125+
126+
This project is licensed under the MIT License. See the [LICENSE](https://github.com/knbr13/dugo/blob/main/LICENSE) file for details.
127+
128+
---
129+
130+
## Acknowledgments
131+
132+
- Inspired by the need for a fast and accurate file deduplication tool.
133+
- Built with Go’s powerful concurrency model for high performance.

0 commit comments

Comments
 (0)