Skip to content

Does the server do deduplication before analyzing files with tika / opensearch? #2296

@spotlesscoder

Description

@spotlesscoder

As the OCR and search indexing are quite resource heavy I was wondering whether the service checks if that file has already been processed before.

Example: User uploads a folder tree where duplicated files are included (for example due to accidentally copying files into their folder again so windows created "myfilename - Copy.png" etc. files

In that case it would not make sense to analyze the files when they are binary equal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions