PDF2

A package for inspecting PDF files.

It is at an early stage of development.

Goal

The current aim of this package is to implement the following features:

Parse PDF files
Validate PDF files
Extract metadata
Extract text, images, tables, links, annotations...
Check for potential security vulnerabilities

References

References to the International Standard ISO 32000-2:2020 (PDF 2.0) Portable document format – Part 2: PDF 2.0 are included in the comments and documentation. These are indicated by the section number, name, and page number(s) in square brackets, e.g. [7.3.10 Indirect objects, p33-34]. Nested square brackets indicate references to other sources, e.g. [[https://www.w3.org/TR/png/#4Concepts.EncodingScanlineAbs] 4.6.2 Scanline serialization].

Needed Help

If you are interested in contributing, please check the TODO list. Contributions to tests with extracts of PDF files that do not open correctly are highly appreciated, provided they do not require a change to the LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PDF2

Goal

References

Needed Help

Files

README.md

Latest commit

History

README.md

File metadata and controls

PDF2

Goal

References

Needed Help