Skip to content

Commit

Permalink
docs: add more details about duplicate file management
Browse files Browse the repository at this point in the history
  • Loading branch information
Spark-NF committed Oct 3, 2023
1 parent d50133e commit ca64785
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 10 deletions.
46 changes: 46 additions & 0 deletions docs/docs/duplicate-files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Duplicate files
---


## Already existing files

When a file is downloaded, Grabber will first check if the same file already exists at the destination, using your configured [Filename](filename.md). If a file already exists with the same filename, Grabber will not download the file.

That's why it's important to have a configured filename that generates a different paths for each image. Otherwise, you might have multiple images sharing the same path, and only one will be downloaded.

For example, `%artist%.%ext%` will always generate the same name for different images from the same artist, causing collisions. As a result, only the first image will be downloaded and other ones will be skipped and counted as duplicates.


## The MD5 list

### Introduction

To understand how the MD5 list works, you first have to know what is an MD5. For each image, we can generate a special code, called "MD5". A MD5 looks like `098f6bcd4621d373cade4e832627b4f6` and is unique to each image, as it corresponds to the contents of the image. So the same image, even on different websites, would always have the same MD5. You can learn more about [hashing](https://en.wikipedia.org/wiki/Hash_function) and [MD5](https://en.wikipedia.org/wiki/MD5) on Wikipedia.

Many sources that Grabber supports provide the MD5 of the image, which allows Grabber to know if you already downloaded the same image somewhere else, even if it was from a different website.

### Configuration

So in addition to being able to not overwrite already existing files, Grabber stores a list of the MD5 codes of all images you've downloaded, to ensure you don't download twice the same image from different sources.

There are two settings that control this behavior in Grabber, both located in the "Save" category of the settings:

* **If a file already exists globally**: if a file with the same MD5 has been downloaded by Grabber anywhere on your computer
* **If it's in the same directory**: if a file with the same MD5 has already been downloaded by Grabber in the same target directory as the image you're trying to save

The available options for both of those settings are:

* **Save**: download and save the image twice (as if it were not a duplicate)
* **Copy**: save the image twice, but copy it from the original image to prevent an useless download
* **Move**: move the original image to the duplicate location (useful for example if you're re-downloading an image that has changed a few tags)
* **Shortcut** / **Symbolic link**: create a [shortcut](https://en.wikipedia.org/wiki/Shortcut_(computing)) or [symbolic link](https://en.wikipedia.org/wiki/Symbolic_link) (depending on your platform) with the duplicate image's name, pointing to the original image (useful if you often have duplicates in different folders, with each folder having an useful meaning)
* **Hard link**: creates a [hard link](https://en.wikipedia.org/wiki/Hard_link) with the duplicate image's name, pointing to the original image (similar to a shortcut / symbolic link)
* **Don't save**: skip duplicate images


### Deleted files

By default, if you delete a file from your computer after it was downloaded, it will also be removed from the MD5 list. Which means that if you try to download the same file again, from the same source or from another one, it will be downloaded again.

If you want Grabber to "remember" that you already downloaded this file even if you deleted it, you can enable the "Keep deleted files in the MD5 list" setting in the "Save" category.
14 changes: 4 additions & 10 deletions docs/docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,25 +38,19 @@ It can be found in "Save > Tags > Category name". Change it to something else th

### How to avoid duplicates?

In addition to being able to not overwrite existing files, Grabber stores a global MD5 list to ensure you don't download twice the same image in different locations.

You can change this behavior to one of those below, in the "Save" category of the settings:

* **Save**: download and save the image twice (as if it's not a duplicate)
* **Copy**: save the image twice, but copy it from the original image to prevent an useless download
* **Move**: move the original image to the duplicate location (useful for example if you're re-downloading an image that has changed a few tags)
* **Link**: create a shortcut with the duplicate image's name, pointing to the original image (useful if you often have duplicates in different folders, with each folder having an useful meaning)
* **Don't save**: skip duplicate images
See the [Duplicate files](duplicate-files.md) documentation page.


### Why do I get an message saying my filename is not unique?

This means that different images can end up having the same filename if you are not careful.

Note that the message is a **warning**, and not an error. You are free to ignore it, you might just sometimes have duplicate filenames for your images.
Note that the message is a **warning**, and not an error. You are free to ignore it, you might just sometimes have duplicate filenames for your images, which will cause some images' download to be skipped.

For example, the filename `%id%.%ext%` will generate this warning, because an ID in a given source will represent a different image on another one.

See [Duplicate files](duplicate-files.md) for more details.


### Why do my tags have spaces between words instead of underscores?

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ nav:
- docs/sources/twitter.md
- docs/favorites.md
- docs/website-login.md
- docs/duplicate-files.md
- docs/faq.md
- Advanced:
- docs/shortcuts.md
Expand Down

0 comments on commit ca64785

Please sign in to comment.