Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a web version of OCR4WikiSource #89

Open
tshrinivasan opened this issue Sep 14, 2016 · 3 comments
Open

Make a web version of OCR4WikiSource #89

tshrinivasan opened this issue Sep 14, 2016 · 3 comments

Comments

@tshrinivasan
Copy link
Owner

The OCR4Wikisource is a python script that runs only on GNU/Linux and in commandline.
Many new users are feeling tough to setup and execute this.

A web version of the same tool is required, so that any new user can use it easily via browser.

Requirements

  1. user login with wiki credentials,
  2. give a URL of PDF file or upload PDF file.
  3. select wikisource language
  4. Give email address for notification
  5. These details are stored in a queue.
  6. OCR4Wikisource should read the queue, OCR it and paste in wikisource
  7. Once done, notify the user.

Can anyone volunteer for creating a web version?

@samwilson
Copy link

Can you elaborate on step 6 "OCR4Wikisource should read the queue, OCR it and paste in wikisource" — does this mean the tool itself would add the text to the relevant page on Wikisource? Or the user would copy and paste the text there?

What differences in workflow or features are there with respect to the proofreadpage system of proofreading a page at a time within wikisource?

I'm wondering if the ws-google-ocr tool could be modified to selectively either use the Vision API or the Drive system of OCR.

@bodhisattwawiki
Copy link
Contributor

  1. Yes, the script itself adds the texts to relevant pages. Users dont have to do it manually.

  2. This script also does OCR one book at a time in contrast to the existing OCR (Phe or ws-google-ocr) system, where single page is OCRed at a time.

@bodhisattwawiki
Copy link
Contributor

bodhisattwawiki commented Sep 14, 2016

@samwilson , we have a test file for Bengali Wikisource. Please feel free to test with it using OCR4Wikisource script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants