-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purge the index file after OCR is completed #74
Comments
Give more details with examples. What do you mean by purge index page? Why we have to do that? Regards, My Life with GNU/Linux : http://goinggnu.wordpress.com Get Free Tamil Ebooks for Android, iOS, Kindle, Computer : |
Purging is needed to update the status of index file. All Wikisources have a list of Index pages, where we can get the updated status of Index pages. (For example, in Bengali Wikisource, https://bn.wikisource.org/w/index.php?title=%E0%A6%AC%E0%A6%BF%E0%A6%B6%E0%A7%87%E0%A6%B7:IndexPages&limit=500&offset=0&key=&order= ) If we dont purge the Index page after OCR, it remains white in stead of red colour, so there is a chance that the same OCR can be done twice by two users. #56 |
Can any one give an example for this with tamil or english wiki source index page examples? |
Example Index: Example purge URL: If you visit the index page, in the top right corner there are three icons. Second icon is for purge. Just need to add ?action=purge to the Index URL and ping it. But, please note that in many other languages including Tamil we are freshly creating index files. As we already thought of limiting this tool to OCR related functions only, I didn't want to keep adding features like (creating index files). But, hope this purge ping will work without the need for creating index files first. |
Thats why I said that it is better to purge after OCR is completed. By then, you already will have created index pages. |
//Thats why I said that it is better to purge after OCR is completed. By then, you already will have created index pages.// We create index pages in batches sometimes after many files are OCRed and pages uploaded. Not necessarily during page upload process. |
@ravidreams , thats unconventional. I dont know any other community doing like this. ;-) Other Wikisource Communities including Bengali create index page first and then go for OCR. |
@BodhisattwaMandal Well, it is because, we didn't have a coordinated effort for taws so far. People have been uploading classic text available in web that was proofread already. Not a single book proofread so far :) You noticed that we had very few pdf books in Tamil uploaded before this tool came. |
Ok, purging wont create new indexes. It only purges already created index pages. |
Do we need this purge option still? @ravidreams Is all other wikisource communities doing purge after OCR is done? |
All other big Wikisource communities has specific bots to purge the index pages. Besides, their OCR method is different from ours. Our method is unique and it requires purging after OCR. |
Hmm. Can not understood still about what is purge and how to do it diagrammatically. Will explore about and comment here later. |
This is my personal opinion regarding this issue which is not directly related with this script. There are so many bots running from Tool Server where we can set this purge action every 1 or 2 hr. User:Wikitanvir already run this from tool server. So apparently I can say that it can be close. |
It would be great if the script can purge the index file after OCR is completed. Users often forget to purge it as they are not doing the OCR manually. It is needed to update the list of index pages.
The text was updated successfully, but these errors were encountered: