As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety of historical printings and obtain high quality results with reasonable time expenditure. Therefore, OCR4all is explicitly planned even for users with no technical background.
This repository contains two guides. One deals with the installation process, the second one gives a brief overview over the functionality using two historical books (also available in this repository) as hands-on examples.
To get started, it is recommended to download the entire repository ("Clone or download" -> "Download ZIP"), install OCR4all by following the setup guide, and then get some hand-on experience by working with the examples covered in the short user guide.
Both guides will be continuously improved and refined. Therefore, user feedback is always welcome.
OCR4all is under active development and consequently, frequent releases containing bug fixes and further functionality can be expected. In order to always be up to date, we highly recommend subscribing to our mailing list where we will always announce notable enhancements.
Plans for the (very) near future:
- Enabling a second project management approach solely based on PageXML allowing for a more flexible workflow.
- Integrating Tesseract for recognition.
- Many minor bug fixes and improvements.