Skip to content

Latest commit

 

History

History
55 lines (40 loc) · 2.92 KB

TODO.md

File metadata and controls

55 lines (40 loc) · 2.92 KB

TODO

Performance

  • Cache created filters.

Fixes

  • Correct the implementation of Display wherever Byte is cast to char. Currently, it does not preserve all non-printable bytes. Use ::std::io::Write instead.
  • In get_trailer, build the trailer from all increments if the standard requires.
  • Correct XRef::parse to cover hybrid-reference cases.

Features

  • Parse free and compressed objects.
  • Log comments in their context.
  • Implement the remaining filters/encoders.
  • Process dictionaries and streams based on their /Type and /SubType so that XRefStream implementation of Process becomes a special case for /Type /XRef.
  • Implement object streams.
  • Implement features specific to linearised PDFs.
  • Report object changes, like being freed, overwritten, or reused in incremental updates.
  • Allow the user to specify the content of the PDF summary.
  • Ensure a tolerant parser and a more restrictive validator. For example, the validator should flag all HACKs allowed in the parser as errors.
  • Parse streams with data stored in an external file.
  • The validator should take into account the version for each incremental update.

Documentation

  • Go through the standard again and document the code accordingly, paying attention to include the supported versions for each feature.

Tests

  • When extracting test cases containing Stream data from PDF files, be careful to preserve the file format (dos /unix) and file encoding (utf-8/utf-16/latin1/...) as changing either of these can change the stream's data.
  • Include a submodule of PDF released to the public domain and use it for testing.
  • Double-check test coverage of all types to cover edge cases.
  • Replace panics and unwraps in tests with assert_eq.
  • Remove redundant tests.

Refactor

  • Use num_traits to refactor the num module
  • Replace println! and eprintln! with log calls.
  • Replace flate2 with a library that allows restricting the output size.
  • For comparing files, it might be better to implement a PartialEq trait for Stream using its decoded data.
  • Consider viewing Escape for Name and LiteralString as a Filter, as it is the case for Hexadecimal.

Future Work

  • Compare PDF files:

    • Index PDF objects for comparison.
    • Implement a cmp feature that allows the package to connect to a database (generate one if needed) to store and query object hashes from different files.
  • Implement a PDF viewer:

    • Minimal, in a way similar to Zathura.
    • Yet, it provides better support for annotations, hyperlinks, and text/image selection and extraction.