-
Notifications
You must be signed in to change notification settings - Fork 8
Using free OCR in Ubuntu
Henryk Paluch edited this page Sep 17, 2016
·
1 revision
The problem: you have image containing text (called input-image.png
in this example) and you want to extract its text into ordinary plain text file using OCR.
Tested on Ubuntu Ubuntu 16.04.1 LTS
Install following packages (support for English and Czech languages):
sudo apt-get install tesseract-ocr-ces tesseract-ocr tesseract-ocr-eng
Verify list of recognized languages:
tesseract --list-langs
List of available languages (4):
equ
ces
eng
osd
Use this example to process input-image.png
containing Czech characters and outputing
results into /tmp/output.txt
file (standard UTF-8 encoding):
tesseract input-image.png /tmp/output.txt -l ces
Copyright © Henryk Paluch. All rights reserved.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License