Releases: HazyResearch/pdftotree
Releases · HazyResearch/pdftotree
v0.5.0
0.5.0 - 2020-10-13
Added
- Support for Python 3.8. (#86, @HiromuHota)
Changed
- Switch the output format from "HTML-like" to hOCR. (#62, @HiromuHota)
- Loosen Keras' version restriction, which is now unnecessarily strict. (#68, @HiromuHota)
- Greedily extract contents from PDF even if it looks scanned. (#71, @HiromuHota)
- Upgrade Keras to 2.4.0 or later (and TensorFlow 2.2 or later). (#86, @HiromuHota)
Removed
- Remove "favor_figures" option and extract everything. (#77, @HiromuHota)
- Remove "dry_run" option. (#89, @HiromuHota)
Fixed
- Fix a bug that an html file is not created at a given path. (#64, @HiromuHota)
- Extract LTChar even if they are not children of LTTextLine. (#79, @HiromuHota)
v0.4.1
This release marks the end of development for the v0.4.x version of pdftotree. Going forward, we plan to change pdftotree to conform to hOCR with v0.5. For this process, we welcome @HiromuHota as a new maintainer.
If you would like to give feedback for this refactor, we invite you to comment in #62.