The idea behind this project is to scrape the top 1000 shopify stores (by traffic) and build an interactive Jupyter notebook by which this data can be explored by users. We will be scraping data points such as; traffic estimates, social media followers, SEO data, products(price, sku's, categories, etc) and technology data. I'm using tools like:
- Selenium, asyncio and BS4 to scrape the data
- Pandas and SQLlite to transform and store the data
- Pandas, jupyter, ipywidgets, plotly, bqplot to build the user facing dashboard.
This is a very ugly work in progress at the moment!