Script allows parsing Sci Fi books from tululu.org. It parses book title, author, genres, image, comments and text if they are present.
Python3 and Git should be already installed.
- Clone the repository by command:
git clone https://github.com/balancy/parse_library
- Go inside cloned repository and create virtual environment by command:
python -m venv env
- Activate virtual environment. For linux-based OS:
source env/bin/activate
For Windows:
env\scripts\activate
- Install dependencies:
pip install -r requirements.txt
General way to use it is via command:
python main.py
Complete list of script arguments :
--start_page start
where start
is the page to start downloads books from. Default value is 1.
--end_page end
where end
is the page to finish download books at. Default value is the last page in category.
--books_folder folder
where folder
is the folder to save text versions of books. By default, folder is 'books/'
--imgs_folder folder
where folder
is the folder to save cover images of books. By default, folder is 'images/'
--json_path folder
where folder
is the folder to save all downloaded library info in JSON format. By default, folder is root folder.
--skip_images
If this argument given (flag enabled), then script will skip books cover images downloading.
--skip_txt
If this argument given (flag enabled), then script will skip books text versions downloading.
You can always see the help how to use script by command:
python main.py -h