It is an offline utility/tool made for searching 'keywords' in Wikipedia Archive instead of using any online WikipediaAPI.
- when you need to search for many 'keywords' in Wikipedia. WikipediaAPI such as Wikipedia may slow down after few dozens of calls.
- if your internet connection is not fast, then this is beneficial as it is an offline search.
- uses very minimal onboard resource.
- tested on Python 3.11
- Wikipedia
- FuzzyWuzzy
- Beautifulsoup
- tdqm
- joblib
- atleast 25 GB free storage space
or you can install using pip install -r "./requirements.txt"
Also, you need to download one image/backup from this wiki-archive page
Download
enwiki-{data}-pages-articles-multistream.xml.bz2
(~23 GB)enwiki-{date}-pages-articles-multistream-index.txt.bz2
(~250 MB)- Extract this file. It will contain
enwiki-{date}-pages-articles-multistream-index.txt
(~1.2 GB)
- Extract this file. It will contain
These file's filepaths will be required when initializing thhe offline wiki class
See testing.ipynb