Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge the index file after OCR is completed #74

Open
bodhisattwawiki opened this issue Feb 21, 2016 · 13 comments
Open

Purge the index file after OCR is completed #74

bodhisattwawiki opened this issue Feb 21, 2016 · 13 comments

Comments

@bodhisattwawiki
Copy link
Contributor

It would be great if the script can purge the index file after OCR is completed. Users often forget to purge it as they are not doing the OCR manually. It is needed to update the list of index pages.

@tshrinivasan
Copy link
Owner

Give more details with examples.

What do you mean by purge index page?

Why we have to do that?

Regards,
T.Shrinivasan

My Life with GNU/Linux : http://goinggnu.wordpress.com
Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com

Get Free Tamil Ebooks for Android, iOS, Kindle, Computer :
http://FreeTamilEbooks.com

@bodhisattwawiki
Copy link
Contributor Author

Purging is needed to update the status of index file. All Wikisources have a list of Index pages, where we can get the updated status of Index pages. (For example, in Bengali Wikisource, https://bn.wikisource.org/w/index.php?title=%E0%A6%AC%E0%A6%BF%E0%A6%B6%E0%A7%87%E0%A6%B7:IndexPages&limit=500&offset=0&key=&order= ) If we dont purge the Index page after OCR, it remains white in stead of red colour, so there is a chance that the same OCR can be done twice by two users. #56

@tshrinivasan
Copy link
Owner

Can any one give an example for this with tamil or english wiki source index page examples?

@ravidreams
Copy link

Example Index:

https://bn.wikisource.org/wiki/%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A7%8D%E0%A6%98%E0%A6%A3%E0%A7%8D%E0%A6%9F:%E0%A6%AA%E0%A6%B2%E0%A7%8D%E0%A6%B2%E0%A7%80-%E0%A6%B8%E0%A6%AE%E0%A6%BE%E0%A6%9C.djvu

Example purge URL:

https://commons.wikimedia.org/wiki/File:%E0%A6%AA%E0%A6%B2%E0%A7%8D%E0%A6%B2%E0%A7%80-%E0%A6%B8%E0%A6%AE%E0%A6%BE%E0%A6%9C.djvu?action=purge

If you visit the index page, in the top right corner there are three icons. Second icon is for purge. Just need to add ?action=purge to the Index URL and ping it.

But, please note that in many other languages including Tamil we are freshly creating index files. As we already thought of limiting this tool to OCR related functions only, I didn't want to keep adding features like (creating index files). But, hope this purge ping will work without the need for creating index files first.

@bodhisattwawiki
Copy link
Contributor Author

Thats why I said that it is better to purge after OCR is completed. By then, you already will have created index pages.

@ravidreams
Copy link

//Thats why I said that it is better to purge after OCR is completed. By then, you already will have created index pages.//

We create index pages in batches sometimes after many files are OCRed and pages uploaded. Not necessarily during page upload process.

@bodhisattwawiki
Copy link
Contributor Author

@ravidreams , thats unconventional. I dont know any other community doing like this. ;-) Other Wikisource Communities including Bengali create index page first and then go for OCR.

@ravidreams
Copy link

@BodhisattwaMandal Well, it is because, we didn't have a coordinated effort for taws so far. People have been uploading classic text available in web that was proofread already. Not a single book proofread so far :) You noticed that we had very few pdf books in Tamil uploaded before this tool came.

@bodhisattwawiki
Copy link
Contributor Author

Ok, purging wont create new indexes. It only purges already created index pages.

@tshrinivasan
Copy link
Owner

Do we need this purge option still? @ravidreams

Is all other wikisource communities doing purge after OCR is done?

@bodhisattwawiki
Copy link
Contributor Author

All other big Wikisource communities has specific bots to purge the index pages. Besides, their OCR method is different from ours. Our method is unique and it requires purging after OCR.
By the way we do have js for soft and hard purging. It might help to make this easier for you.
https://bn.wikisource.org/s/805

@tshrinivasan
Copy link
Owner

Hmm. Can not understood still about what is purge and how to do it diagrammatically.

Will explore about and comment here later.

@jayantanth
Copy link
Contributor

This is my personal opinion regarding this issue which is not directly related with this script. There are so many bots running from Tool Server where we can set this purge action every 1 or 2 hr. User:Wikitanvir already run this from tool server. So apparently I can say that it can be close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants