Does the server do deduplication before analyzing files with tika / opensearch?

As the OCR and search indexing are quite resource heavy I was wondering whether the service checks if that file has already been processed before.

Example: User uploads a folder tree where duplicated files are included (for example due to accidentally copying files into their folder again so windows created "myfilename - Copy.png" etc. files

In that case it would not make sense to analyze the files when they are binary equal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the server do deduplication before analyzing files with tika / opensearch? #2296

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does the server do deduplication before analyzing files with tika / opensearch? #2296

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions