Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: file not found on MacOS when opening /tmp file #735

Open
yonran opened this issue Feb 17, 2024 · 2 comments
Open

Regression: file not found on MacOS when opening /tmp file #735

yonran opened this issue Feb 17, 2024 · 2 comments

Comments

@yonran
Copy link

yonran commented Feb 17, 2024

Starting 05398d6 1.84.0, on darwin MacOS, leptonica gives an error when opening a file in /tmp. Also, the error message does not give the actual path that it tried to open. For example, here is a program (based on tesseract.cpp):

#include <allheaders.h>

int main(int argc, char* argv[]) {
    const char* image = "/tmp/ocrmypdf.io.uss3ldn7/000011_ocr.png";
    struct Pix *pixs = pixRead(image);
    if (!pixs) {
      fprintf(stderr, "Leptonica can't process input file: %s\n", image);
      return 2;
    }
    return 0;
}

It gives this output:

Leptonica Error in fopenReadStream: file not found: 000011_ocr.png
Leptonica Error in pixRead: image file not found: /tmp/ocrmypdf.io.uss3ldn7/000011_ocr.png
Leptonica can't process input file: /tmp/ocrmypdf.io.uss3ldn7/000011_ocr.png

This affects ocrmypdf when TMPDIR=/tmp, which uses tesseract, which calls leptonica:

nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4b8e9717fac859f830fa318a0cc1e2d4a40df152.tar.gz -p ocrmypdf --run 'ocrmypdf --redo-ocr --verbose=1 --keep-temporary-files ~/Downloads/20231017_TransferTaxExemptionMeasure.pdf ~/Downloads/20231017_TransferTaxExemptionMeasure-ocr.pdf'
…
    1 Running: ['/nix/store/pgz54swxlbxc2lxx23ramcfz099v7n6z-tesseract-5.3.3/bin/tesseract', '-l', 'eng', '-c', 'textonly_pdf=1',   __init__.py:134
'/tmp/ocrmypdf.io.xu77l3_5/000001_ocr.png', '/tmp/ocrmypdf.io.xu77l3_5/000001_ocr_tess', 'pdf', 'txt']                                             
    1  Leptonica Error in fopenReadStream: file not found: 000001_ocr.png                                                          tesseract.py:252
    1  Leptonica Error in findFileFormat: image file not found: /tmp/ocrmypdf.io.xu77l3_5/000001_ocr.png                           tesseract.py:252
    1  Leptonica Error in fopenReadStream: file not found: PNG                                                                     tesseract.py:252
    1  Leptonica Error in pixRead: image file not found: PNG                                                                       tesseract.py:252

(note: NixOS/nixpkgs@4b8e971 is the first commit that contains both the NixOS/nixpkgs@628b90b and a fix for an unrelated error “Abort trap: 6 mutool -v” NixOS/nixpkgs@11498ae )

Workaround: Set TMPDIR=/private/tmp instead of /tmp before invoking ocrmypdf

@DanBloomberg
Copy link
Owner

@stweil

I remember a recent proposal to allow TMPDIR path rewrites for MacOS, but I believe it was shelved. This has been an issue for quite a while. We solved it for Windows by allowing path rewrites and universally using genPathname() and fopenReadStream(). These packaging issues are of course well above my pay grade.

Yonathan also points out that fopenReadStream() is not giving the path when it can't open the file locally. We can give more information at that failure point; e.g. replace line 1896 by

        lept_stderr("Failed in %s to open locally with tail %s " 
                    "for filename %s\n", __func__, tail, filename);

@DanBloomberg
Copy link
Owner

Oops, one should always use the error macros for error messages, not lept_stderr

        L_ERROR("failed to open locally with tail %s for filename %s\n",
                __func__, tail, filename);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants