Skip to content

Latest commit

 

History

History
77 lines (55 loc) · 2.87 KB

environment.md

File metadata and controls

77 lines (55 loc) · 2.87 KB

Environment Setup

  1. Download and install Python 3.8 or newer.
  2. Clone this repository and change directory to the repository folder:
git clone [email protected]:DataKind-UK/FiresoulsDataCorps.git
cd FiresoulsDataCorps
  1. If the proxy service will be used, duplicate the sample.env file and update it with the correct key value. It can be obtained from ScraperAPI. Rename this file to .env

  2. Add credentials to connect to the database. The required credentials are the same as those shown on the sample.env file.

  3. Install pipenv

  4. Install the packages using pipenv. From the terminal type:

pipenv shell
pipenv install Pipfile

If development packages are required, also run:

pipenv install --dev

This should install all the necessary python packages to work with this project.

  1. The environment is installed and ready to use, develop and test.

To recreate the environment from scratch

The following commands were used to create the environment in the first place.

NOTE: Do not try this if the step 6 was successful.

pipenv shell
pipenv install requests beautifulsoup4 scrapy pyodbc pandas selenium typer
pipenv install black pytest pytest-cov deon mypy autoflake --dev --pre

Ethics checklist

An ethics checklist was generated for this project using deon. Ideally we can use DataKindUK's ethics checklist and include it in the project. In the meantime we can use DataDriven's ethics checklist.

deon -o ETHICS.md

Python packages

The following python packages are installed and ready to use from the environment:

  • pipenv : to set up the dev environment.
  • requests : to get the websites html code.
  • beautifulsoup : to parse the content of the webpages.
  • scrapy (just in case we need it)
  • selenium (just in case we need it)
  • pyodbc : to connect to databases.
  • pandas : for data analysis and pd.read_html() function.
  • typer : to create the Command Line Interface and running the scrapers.

For development:

  • pytest : for testing the code.
  • pytest-cov : for getting code coverage metrics.
  • black : For code formatting.
  • mypy : For typing verification.
  • autoflake : To remove unused imports.
  • deon : For ethics checklist of the project.