- Download and install Python 3.8 or newer.
- Clone this repository and change directory to the repository folder:
git clone [email protected]:DataKind-UK/FiresoulsDataCorps.git
cd FiresoulsDataCorps
-
If the proxy service will be used, duplicate the
sample.env
file and update it with the correct key value. It can be obtained from ScraperAPI. Rename this file to.env
-
Add credentials to connect to the database. The required credentials are the same as those shown on the
sample.env
file. -
Install
pipenv
-
Install the packages using
pipenv
. From the terminal type:
pipenv shell
pipenv install Pipfile
If development packages are required, also run:
pipenv install --dev
This should install all the necessary python packages to work with this project.
- The environment is installed and ready to use, develop and test.
The following commands were used to create the environment in the first place.
NOTE: Do not try this if the step 6 was successful.
pipenv shell
pipenv install requests beautifulsoup4 scrapy pyodbc pandas selenium typer
pipenv install black pytest pytest-cov deon mypy autoflake --dev --pre
An ethics checklist was generated for this project using deon
. Ideally we can use DataKindUK's ethics checklist and include it in the project. In the meantime we can use DataDriven's ethics checklist.
deon -o ETHICS.md
The following python packages are installed and ready to use from the environment:
- pipenv : to set up the dev environment.
- requests : to get the websites html code.
- beautifulsoup : to parse the content of the webpages.
- scrapy (just in case we need it)
- selenium (just in case we need it)
- pyodbc : to connect to databases.
- pandas : for data analysis and pd.read_html() function.
- typer : to create the Command Line Interface and running the scrapers.