Skip to content

Commit

Permalink
Use getPdfRendererResolution() in concatpdf.c
Browse files Browse the repository at this point in the history
* As with cleanpdf.c, this prevents the pdftoppm renderer from
  making very large ppm files.
* Also improve comments in environ.h.
  • Loading branch information
DanBloomberg committed May 27, 2023
1 parent a1ac3c3 commit a7b5bc2
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 16 deletions.
45 changes: 32 additions & 13 deletions prog/concatpdf.c
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,15 @@
* The pdf output is written to %outfile. It is advisable (but not
* required) to have a '.pdf' extension.
*
* The intent is to use pdftoppm to render the images at 150 pixels/inch
* for a full page, when scalefactor = 1.0. The renderer uses the
* mediaboxes to decide how big to make the images. If those boxes
* have values that are too large, the intermediate ppm images can
* be very large. To prevent that, we compute the resolution to input
* to pdftoppm that results in RGB ppm images representing page images
* at about 150 ppi (when scalefactor = 1.0). These images are about
* 6MB, but are written quickly because there is no compression.
*
* N.B. This requires the Poppler package of pdf utilities, such as
* pdfimages and pdftoppm. For non-unix systems, this requires
* installation of the cygwin Poppler package:
Expand Down Expand Up @@ -112,7 +121,7 @@ l_int32 main(int argc,
{
char buf[256];
char *basedir, *fname, *tail, *basename, *imagedir, *title, *outfile;
l_int32 res, one_bit, save_color, quality, i, n, ret;
l_int32 res, render_res, one_bit, save_color, quality, i, n, ret;
l_float32 scalefactor, colorfract;
PIX *pixs, *pix1, *pix2;
PIXA *pixa1 = NULL;
Expand Down Expand Up @@ -141,32 +150,42 @@ SARRAY *sa;
quality = 95;
}

/* Set up a directory for temp images */
imagedir = stringJoin(basedir, "/image");
#ifndef _WIN32
mkdir(imagedir, 0777);
#else
_mkdir(imagedir);
#endif /* _WIN32 */

/* Get the names of the pdf files */
if ((sa = getSortedPathnamesInDirectory(basedir, "pdf", 0, 0)) == NULL)
return ERROR_INT("files not found", __func__, 1);
sarrayWriteStderr(sa);
n = sarrayGetCount(sa);

/* Figure out the resolution to use with the image renderer to
* generate page images with a resolution of not more than 150 ppi.
* These would have a maximum dimension of about 1650 pixels.
* Use the first pdf file in the directory. */
fname = sarrayGetString(sa, 0, L_NOCOPY);
getPdfRendererResolution(fname, imagedir, &render_res); /* for 300 ppi */
render_res /= 2; /* for 150 ppi */

/* Rasterize:
* pdftoppm -r 150 fname outroot
* Use of pdftoppm:
* This works on all pdf pages, both wrapped images and pages that
* were made orthographically. We use the default output resolution
* of 150 ppi for pdftoppm, which makes uncompressed 6 MB files
* and is very fast. If you want higher resolution 1 bpp output,
* use cleanpdf.c. */
imagedir = stringJoin(basedir, "/image");
#ifndef _WIN32
mkdir(imagedir, 0777);
#else
_mkdir(imagedir);
#endif /* _WIN32 */
* were made orthographically. We generate images that are no
* larger than about 1650 pixels in the maximum direction. This
* makes uncompressed 6 MB files and is very fast. If you want
* higher resolution 1 bpp output, use cleanpdf.c. */
for (i = 0; i < n; i++) {
fname = sarrayGetString(sa, i, L_NOCOPY);
splitPathAtDirectory(fname, NULL, &tail);
splitPathAtExtension(tail, &basename, NULL);
snprintf(buf, sizeof(buf), "pdftoppm -r 150 %s %s/%s",
fname, imagedir, basename);
snprintf(buf, sizeof(buf), "pdftoppm -r %d %s %s/%s",
render_res, fname, imagedir, basename);
lept_free(tail);
lept_free(basename);
lept_stderr("%s\n", buf);
Expand Down
8 changes: 6 additions & 2 deletions src/environ.h
Original file line number Diff line number Diff line change
Expand Up @@ -502,16 +502,18 @@ LEPT_DLL extern l_int32 LeptMsgSeverity;
* a : <message string>
* b : __func__ (the procedure name)
* c : <return value from function>
* A newline is added by the function after the message.
*
* (2) The messages
* ERROR_INT_1(a,f,b,c) : returns l_int32
* ERROR_FLOAT_2(a,f,b,c) : returns l_float32
* ERROR_PTR_2(a,f,b,c) : returns void*
* ERROR_FLOAT_1(a,f,b,c) : returns l_float32
* ERROR_PTR_1(a,f,b,c) : returns void*
* are used to return from functions and take four parameters:
* a : <message string>
* f : <second message string> (typically, a filename for an fopen()))
* b : __func__ (the procedure name)
* c : <return value from function>
* A newline is added by the function after the message.
*
* (3) The purely informational L_* messages
* L_ERROR(a,...)
Expand All @@ -521,6 +523,8 @@ LEPT_DLL extern l_int32 LeptMsgSeverity;
* a : <message string> with optional format conversions
* v1 : procName (this must be included as the first vararg)
* v2, ... : optional varargs to match format converters in the message
* Unlike the messages that return a value in (2) and (3) above,
* here a newline needs to be included at the end of the message string.
*
* To return an error from a function that returns void, use:
* L_ERROR(<message string>, procName, [...])
Expand Down
2 changes: 1 addition & 1 deletion version-notes.html
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ <h2 align=center> <IMG SRC="moller52.jpg" border=1 ALIGN_MIDDLE> </h2>
* Add getPdfPageCount() to find the number of pages in a pdf file.
* Add getPdfPageSizes() and getPdfMediaBoxSizes() to find the
information necessary to render images properly. Modify cleanpdf.c
to use this information.
and concatpdf.c to use this information.
* Add prog/splitpdf.c to split a pdf file into nearly equal page sets.
* Add ability to read and write rgba in bmp format, and
test in ioformats_reg.
Expand Down

0 comments on commit a7b5bc2

Please sign in to comment.