A utility to de-duplicate media and other files based on hashing the contents of the file rather than relying on its filename
- Create
safe_words.txt&target_extensions.txt(can rename/copy the.examplefiles) - Install requirements:
python -m pip install -r requirements.txt - Run
hash_comp.py [space-separated list of directories to process (recursive)]
python.exe hash_comp.py C:/users/<me>/Documents/Media C:/users/<me>/Media
- Wait for results... (suggest run overnight)
- Run
clean_hash.pyon the results file (hash_res.txt)
python.exe clean_hash.py ./hash_res.txt -y
- use `-y` to indicate you want to delete files
- omit `-y` or add `-t` to run in `test` mode