Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 1.23 KB

README.md

File metadata and controls

27 lines (18 loc) · 1.23 KB

PDF2

CI

A package for inspecting PDF files.

It is at an early stage of development.

Goal

The current aim of this package is to implement the following features:

  • Parse PDF files
  • Validate PDF files
  • Extract metadata
  • Extract text, images, tables, links, annotations...
  • Check for potential security vulnerabilities

References

References to the International Standard ISO 32000-2:2020 (PDF 2.0) Portable document format – Part 2: PDF 2.0 are included in the comments and documentation. These are indicated by the section number, name, and page number(s) in square brackets, e.g. [7.3.10 Indirect objects, p33-34]. Nested square brackets indicate references to other sources, e.g. [[https://www.w3.org/TR/png/#4Concepts.EncodingScanlineAbs] 4.6.2 Scanline serialization].

Needed Help

If you are interested in contributing, please check the TODO list. Contributions to tests with extracts of PDF files that do not open correctly are highly appreciated, provided they do not require a change to the LICENSE.