Skip to content

Files

Latest commit

40ab933 · Jul 9, 2019

History

History

Justwatch

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jul 9, 2019
Oct 24, 2018

This project revolved around scraping information about all the content providers and the content titles in the US using the website www.justwatch.com/us. It was completed in 2 parts

  • First, all the information was extracted from json data available on the home page
  • Next, each individual title was opened using the url to get more detailed information (more work pending)

Caveat: Had to pull data in smaller chunks as the limit for json data per provider-genre pair is 1,500

Issue: There is one known issue so far that it does not pull in all the titles for each provider (as of Sep2018, it was pulling in data for ~64k titles as opposed to ~69k mentioned on the website)