Make a web version of OCR4WikiSource #89

tshrinivasan · 2016-09-14T08:51:59Z

The OCR4Wikisource is a python script that runs only on GNU/Linux and in commandline.
Many new users are feeling tough to setup and execute this.

A web version of the same tool is required, so that any new user can use it easily via browser.

Requirements

user login with wiki credentials,
give a URL of PDF file or upload PDF file.
select wikisource language
Give email address for notification
These details are stored in a queue.
OCR4Wikisource should read the queue, OCR it and paste in wikisource
Once done, notify the user.

Can anyone volunteer for creating a web version?

samwilson · 2016-09-14T09:01:43Z

Can you elaborate on step 6 "OCR4Wikisource should read the queue, OCR it and paste in wikisource" — does this mean the tool itself would add the text to the relevant page on Wikisource? Or the user would copy and paste the text there?

What differences in workflow or features are there with respect to the proofreadpage system of proofreading a page at a time within wikisource?

I'm wondering if the ws-google-ocr tool could be modified to selectively either use the Vision API or the Drive system of OCR.

bodhisattwawiki · 2016-09-14T09:09:55Z

Yes, the script itself adds the texts to relevant pages. Users dont have to do it manually.
This script also does OCR one book at a time in contrast to the existing OCR (Phe or ws-google-ocr) system, where single page is OCRed at a time.

bodhisattwawiki · 2016-09-14T11:40:41Z

@samwilson , we have a test file for Bengali Wikisource. Please feel free to test with it using OCR4Wikisource script.

tshrinivasan mentioned this issue Sep 14, 2016

Proposal- run in Toolserver http://tools.wmflabs.org #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a web version of OCR4WikiSource #89

Make a web version of OCR4WikiSource #89

tshrinivasan commented Sep 14, 2016

samwilson commented Sep 14, 2016

bodhisattwawiki commented Sep 14, 2016

bodhisattwawiki commented Sep 14, 2016 •

edited

Loading

Make a web version of OCR4WikiSource #89

Make a web version of OCR4WikiSource #89

Comments

tshrinivasan commented Sep 14, 2016

samwilson commented Sep 14, 2016

bodhisattwawiki commented Sep 14, 2016

bodhisattwawiki commented Sep 14, 2016 • edited Loading

bodhisattwawiki commented Sep 14, 2016 •

edited

Loading