Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

pmbaumgartner/nyc-test-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NYC COVID-19 Testing Wait Times PDF Scraper

request to create a scraper for nyc testing times

Challenge accepted!

This repo will automatically run the scraper every 15 minutes with a cronjob set up through GitHub Actions. Thanks to this wonderful blog post from Jason Etcovitch that had most of the action automation setup I pulled from for the workflow.

The csv filenames are structured as {two-hour time window}-{scrape timestamp}.csv.

Changelog

  • 2020-11-30 21:07: Data now includes the time window as a column as well as the scrape time. The latest data is also stored in latest.csv. Corrected some data cleaning issues with characters being parsed incorrectly and newlines being included in the wait times column.
  • 2020-11-30 21:14: Data moved to the data folder, copy of latest.csv included in root dir.
  • 2020-12-01 08:28: Added md5 hash of PDF content to check if PDF is a new file. If not a new file, don't parse or add a new csv.
  • 2020-12-01 11:20: Made scraper more robust to changes in PDF structure.
  • 2020-12-05 3:00: Updated to account for new structure of PDF: including update times and a new PDF structure (increased resolution). Added playwright to take screenshot of scraped PDF and include in docs for debugging.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages