Skip to content
/ covid-19 Public

Scraping corona virus data from worldometers.info

Notifications You must be signed in to change notification settings

bxff/covid-19

Repository files navigation

covid-19

Scraping covid-19 data from worldometers.info

Info

Goal

This repo is made for scraping data of covid-19 from worldometers, and is public for everone to look and work apon.

Data Viewing

This code stores data from worldometers.info into the data directory of this repo. It stores a json and a csv format of the data with the file name formate being data-D-M-Y.FORMATE where D stands for day, M stands for mounth, Y stands for year and FORMATE standing for the formate of the file e.g. json or csv. If you want to view the data from Github itself I recoment you to view the csv format as the webiste will format it into a clean table.

What does this code do

It does two things:

  • It will scrap all the important(the table) from worldometers everday 10 minutes before midnight. Thanks to Github Actions.
  • It can scrap data from the last snapshots of the all the days it has been snapshoted from the Way Back Machine and store it.

Installation

Use the package manager pip to install the requirements.

python3 -m pip install -r requirements.txt

Usage

Run the yesterdays scrap locally:

python3 Scrap.py

Scrap all dates data from worldometers over WayBackMachine:

Note: An scrap_error.txt file will be created if not already created and will append the error file with the raw_url which can be used incase of an 1040 Database Error from WayBackMachine when Scrap_wayback.py file is ran

python3 Scrap_wayback.py

Scrap data from a particular date over WayBackMachine:

python3 Scrap_wayback.py [<--date>|<-d>] <last-snapshot|first-snapshot> <date-like-29-01-2020>

Scrap data from a particular raw_url over WayBackMachine:

python3 Scrap_wayback.py [<--raw-url>|<-r>] <raw_url>

TODO

  • Work apon making more usable code
    • Make Scrap_wayback.py use multiprocess for faster processing
    • Make a cleaner python application to clean the data from statistics/curve/get_data.py
    • Make better way of extracting argv(s)
    • Make colored logs...
  • Make a pipy library from it
    • Make a proper doc of how to use the lib
  • Work on better README
  • List it on others repos