novels

This repo is to scrape certain websites for their fictions and compile them into an epub format using selenium for scraping and calibre for epub packing

Requirements

Selenium

sudo -H pip install selenium

Gecko Driver Firefox

wget https://github.com/mozilla/geckodriver/releases/download/v0.19.0/geckodriver-v0.19.0-linux64.tar.gz
tar -xzvf geckodriver-v0.19.0-linux64.tar.gz
rm -rf geckodriver-v0.19.0-linux64.tar.gz
sudo ln -sf geckodriver /usr/bin/

PhantomJS Driver

sudo apt-get update
sudo apt-get install build-essential chrpath libssl-dev libxft-dev
sudo apt-get install libfreetype6 libfreetype6-dev
sudo apt-get install libfontconfig1 libfontconfig1-dev
export PHANTOM_JS="phantomjs-2.5.0-beta-linux-ubuntu-xenial-x86_64"
wget https://bitbucket.org/ariya/phantomjs/downloads/$PHANTOM_JS.tar.gz
sudo tar xvjf $PHANTOM_JS.tar.gz
rm -rf $PHANTOM_JS.tar.gz
sudo mv $PHANTOM_JS /usr/local/share
sudo ln -sf /usr/local/share/$PHANTOM_JS/bin/phantomjs /usr/local/bin

Calibre

sudo -v && wget -nv -O- https://download.calibre-ebook.com/linux-installer.py | sudo python -c "import sys; main=lambda:sys.stderr.write('Download failed\n'); exec(sys.stdin.read()); main()"

Requirements for Spark

Java

sudo apt-get update
sudo apt-get install default-jdk

Spark

sudo -H pip install pyspark

How to run

Regular Version

./driver.py urlOfFirstChapter fictionName

Spark Version

spark-submit spark_driver.py urlOfLatestChapter fictionName

Spark vs Regular

Spark shows a 40% decrease in time over the regular version for 50 chapter test and a 60% decrease over 1300 chapters on a 4 core computer. Note that partitions should be increased/decreased to better optimize for the number of cores one has.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
driver.py		driver.py
spark_driver.py		spark_driver.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

novels

Requirements

Requirements for Spark

How to run

Spark vs Regular

Desired Features

Supported Sites

Planned Sites

About

Releases

Packages

Languages

License

rhett-g/novels

Folders and files

Latest commit

History

Repository files navigation

novels

Requirements

Requirements for Spark

How to run

Spark vs Regular

Desired Features

Supported Sites

Planned Sites

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages