You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PDF files (and other formats like DOCX) pose a challenge for presenting content online. PDF viewers for browsers are complex software by themselves and there is no consistent standard for presenting PDFs across mobile and desktop browsers. Formats like DOCX can be converted to PDF and made available for presentation.
Approach 1 - present pdf directly
Large PDF files cause a slow loading response, because even viewing the first few pages requires the full PDF document into the browser. Currently we follow this approach
An alternative is to process the PDF into a linearized PDF . THat means processing the pdf files into a linearized pdf using something like qpdf.
This still presents a problem of loading a single pdf.
Approach 2 - convert a pdf to an image at runtime
PDF (or a specific page of a pdf) can be converted to an image at runtime and presented online. This allows on demand request of pages, and pages themselves are just images so they can be loaded across devices without a problem. This implies using an intermediate service to process the PDF page request into an image.
Approach 3 - preprocess the PDF into images
Convert the PDF into images in advance and serve images when requested via the browser. Complete PDF can be made available for download. THis approach is similar to Approach 2, but simpler because there is no intermediate service that processes the pdf. The downside, the disk-space usage immediately doubles as the images are essentially duplicates of the file.
Approach 4 - using specialized tools that convert PDF to HTML "lookalikes"
Sticking to "PDF" as an example for the general "not every webbrowser" document type.
Approach 1 will probably remain as the default option for heavy clients where we expect no problems / availability of readers. This is also the version that is covered by digital signing. None of the other options should be designated as digitally signed, as there is a possibility that derivatives might be out of date, tampered with or otherwise incorrect due to mistakes (eg restore), bugs (failing updates, mistakes in dynamic filename creation) or abuse.
Approach 4 is to some extent close to general conversion / OCR which we are looking at in the context of data analysis and improving searches towards content. The risk of changing meaning through incorrect results in structure interpretation remains. So i'm thinking this approach may not be adding much to the pure text edition in terms of utility, but it probably adds much in complexity resulting in an increase in maintenance.
So of these i'm thinking either 2 or 3 should be done, with no clear preference. Nonetheless leaning towards preprocessed, for smaller benefits such as lower latency and easier diagnostics.
PDF files (and other formats like DOCX) pose a challenge for presenting content online. PDF viewers for browsers are complex software by themselves and there is no consistent standard for presenting PDFs across mobile and desktop browsers. Formats like DOCX can be converted to PDF and made available for presentation.
Approach 1 - present pdf directly
Large PDF files cause a slow loading response, because even viewing the first few pages requires the full PDF document into the browser. Currently we follow this approach
An alternative is to process the PDF into a linearized PDF . THat means processing the pdf files into a linearized pdf using something like qpdf.
This still presents a problem of loading a single pdf.
Approach 2 - convert a pdf to an image at runtime
PDF (or a specific page of a pdf) can be converted to an image at runtime and presented online. This allows on demand request of pages, and pages themselves are just images so they can be loaded across devices without a problem. This implies using an intermediate service to process the PDF page request into an image.
Approach 3 - preprocess the PDF into images
Convert the PDF into images in advance and serve images when requested via the browser. Complete PDF can be made available for download. THis approach is similar to Approach 2, but simpler because there is no intermediate service that processes the pdf. The downside, the disk-space usage immediately doubles as the images are essentially duplicates of the file.
Approach 4 - using specialized tools that convert PDF to HTML "lookalikes"
See http://coolwanglu.github.io/pdf2htmlEX/
The text was updated successfully, but these errors were encountered: