Skip to content

Commit

Permalink
Update for beta state
Browse files Browse the repository at this point in the history
  • Loading branch information
aryaminus committed Feb 18, 2018
1 parent 730dc5e commit 04cd175
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 5 deletions.
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
# Saram - Image/PDF OCR conversion
Get OCR in txt form from an image or pdf extension supporting multiple files from directory using `pytesseract` with support for rotation in case of wrong orientation along.

**Currently in alpha state**
**Currently in beta state**

[![Saram features](https://i.imgur.com/M9dAwPq.gif)](https://i.imgur.com/M9dAwPq.gif)

**Note:**
Make sure you have a OCR tool like `tesseract` and certain data value for comparing OCR, eg `tesseract-data-eng` along with `Pillow` and `Wand` for image conversion and loading which will be fetched during pip install
Make sure you have a OCR tool like `tesseract` and certain data value for comparing OCR, eg `tesseract-data-eng` along with `Pillow` and `Wand` for image conversion and loading which will be fetched during pip install.

**For using in python**:
Refer to the <a href="https://github.com/aryaminus/saram/tree/py-module" target="_blank">py-module</a> branch

## Installation

Expand All @@ -26,7 +29,7 @@ $ python main.py <dirname>
```

## Todo
- [x] Add support for PDF by PDF -> image -> txt with converted image deletion after processing
- [x] Add support for PDF by PDF -> Image -> Txt with converted image deletion after processing
- [x] Double check for orientation in case of image and PDF
- [x] Make a PIP package
- [ ] Add NLP to process the most repeated frequent characters to filer content
Expand Down
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ To use (with caution), simply do::
$ pip install saram
$ saram <dirname>

Mkae sure you have a OCR tool like tesseract and certain data value for comparing OCR
Make sure you have a OCR tool like `tesseract` and certain data value for comparing OCR, eg `tesseract-data-eng` along with `Pillow` and `Wand` for image conversion and loading which will be fetched during pip install.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ def readme():
setup(
name = 'saram',
packages = ['saram'], # this must be the same as the name above
version = '0.8.2',
version = '1.0.1',
description = 'A library to fetch images from a directory and get OCR and store in txt with orientation rotation check and pdf support.',
long_description = readme(),
author = 'Sunim Acharya',
Expand Down

0 comments on commit 04cd175

Please sign in to comment.