Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 785 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 785 Bytes

tesseract-trainer

This is a set of two tools used to generate OCR training files for Tesseract. It is particularly designed for image files with small numbers of characters. It will help you create box files, assuming the name of the image file reflects the text contained in the image.

To run the tesseract trainer, you need to point it at a directory containing a set of image files and a set of box files with corresponding file names. e.g. You might have a directory containing:

  • asdf.png
  • asdf.box
  • qwerty.png
  • qwerty.box

Where the file names correspond to the characters that the image contains.

This will produce a trained font file "traineddata.cap" (if you're using the default font name 'cap')

Put this file in /usr/local/share/tessdata to make the font available