You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting 05398d6 1.84.0, on darwin MacOS, leptonica gives an error when opening a file in /tmp. Also, the error message does not give the actual path that it tried to open. For example, here is a program (based on tesseract.cpp):
#include<allheaders.h>intmain(intargc, char*argv[]) {
constchar*image="/tmp/ocrmypdf.io.uss3ldn7/000011_ocr.png";
structPix*pixs=pixRead(image);
if (!pixs) {
fprintf(stderr, "Leptonica can't process input file: %s\n", image);
return2;
}
return0;
}
It gives this output:
Leptonica Error in fopenReadStream: file not found: 000011_ocr.png
Leptonica Error in pixRead: image file not found: /tmp/ocrmypdf.io.uss3ldn7/000011_ocr.png
Leptonica can't process input file: /tmp/ocrmypdf.io.uss3ldn7/000011_ocr.png
This affects ocrmypdf when TMPDIR=/tmp, which uses tesseract, which calls leptonica:
nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4b8e9717fac859f830fa318a0cc1e2d4a40df152.tar.gz -p ocrmypdf --run 'ocrmypdf --redo-ocr --verbose=1 --keep-temporary-files ~/Downloads/20231017_TransferTaxExemptionMeasure.pdf ~/Downloads/20231017_TransferTaxExemptionMeasure-ocr.pdf'
…
1 Running: ['/nix/store/pgz54swxlbxc2lxx23ramcfz099v7n6z-tesseract-5.3.3/bin/tesseract', '-l', 'eng', '-c', 'textonly_pdf=1', __init__.py:134
'/tmp/ocrmypdf.io.xu77l3_5/000001_ocr.png', '/tmp/ocrmypdf.io.xu77l3_5/000001_ocr_tess', 'pdf', 'txt']
1 Leptonica Error in fopenReadStream: file not found: 000001_ocr.png tesseract.py:252
1 Leptonica Error in findFileFormat: image file not found: /tmp/ocrmypdf.io.xu77l3_5/000001_ocr.png tesseract.py:252
1 Leptonica Error in fopenReadStream: file not found: PNG tesseract.py:252
1 Leptonica Error in pixRead: image file not found: PNG tesseract.py:252
I remember a recent proposal to allow TMPDIR path rewrites for MacOS, but I believe it was shelved. This has been an issue for quite a while. We solved it for Windows by allowing path rewrites and universally using genPathname() and fopenReadStream(). These packaging issues are of course well above my pay grade.
Yonathan also points out that fopenReadStream() is not giving the path when it can't open the file locally. We can give more information at that failure point; e.g. replace line 1896 by
lept_stderr("Failed in %s to open locally with tail %s "
"for filename %s\n", __func__, tail, filename);
Starting 05398d6 1.84.0, on darwin MacOS, leptonica gives an error when opening a file in /tmp. Also, the error message does not give the actual path that it tried to open. For example, here is a program (based on tesseract.cpp):
It gives this output:
This affects
ocrmypdf
whenTMPDIR=/tmp
, which usestesseract
, which calls leptonica:(note: NixOS/nixpkgs@4b8e971 is the first commit that contains both the NixOS/nixpkgs@628b90b and a fix for an unrelated error “Abort trap: 6 mutool -v” NixOS/nixpkgs@11498ae )
Workaround: Set TMPDIR=/private/tmp instead of /tmp before invoking
ocrmypdf
The text was updated successfully, but these errors were encountered: