QNAP Docker Setting to detect only eng? #692

tanderson1992 · 2020-07-26T17:10:25Z

I setup paperless with the docker instructions. After install it worked fine on a few PDFs until I got to my vehicle registration. The document is entirely in English, but it seems to be detecting it as cat/ca which is not installed. Is there a setting to force the software to use only English, or just skip OCR instead of failing to process? I see this in the 0.3.3 changelog but don't see where to set the default language. "Timezone, items per page, and default language are now all configurable..." I have "PAPERLESS_OCR_LANGUAGES=" [set to blank] in the yml file used to install paperless.

Here's a snippet of the error. I can work on full logs if that would help, but I think the issue is it's somehow detecting another language and trying to ocr in that language even though I've specified not to ocr in any language other than English.

Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm                                                                                                                                                                                                                                             
[pgm_pipe @ 0x558c05dd90c0] Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                                                          
[image2 @ 0x558c05ddac40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x558c05ddac40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
OCRing the document                                                                                                                                                                                                                                                                                                                                                               
Parsing for eng                                                                                                                                                                                                                                                                                                                                                                   
Parsing for cat                                                                                                                                                                                                                                                                                                                                                                   
Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.unpaper.pnm                                                                                                                                                                                                                             
Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm                                                                                                                                                                                                                                             
[pgm_pipe @ 0x55dd25c170c0] [pgm_pipe @ 0x55ccf30aa0c0] Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                              
Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                                                                                      
[image2 @ 0x55dd25c18c40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x55dd25c18c40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
[image2 @ 0x55ccf30abc40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x55ccf30abc40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
OCRing the document                                                                                                                                                                                                                                                                                                                                                               
Parsing for eng                                                                                                                                                                                                                                                                                                                                                                   
Parsing for cat                                                                                                                                                                                                                                                                                                                                                                   
PARSE FAILURE for /consume/Registration.pdf: The guessed language (ca) is not available in this instance of Tesseract.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QNAP Docker Setting to detect only eng? #692

QNAP Docker Setting to detect only eng? #692

tanderson1992 commented Jul 26, 2020 •

edited

Loading

QNAP Docker Setting to detect only eng? #692

QNAP Docker Setting to detect only eng? #692

Comments

tanderson1992 commented Jul 26, 2020 • edited Loading

tanderson1992 commented Jul 26, 2020 •

edited

Loading