The Department of Planning, Transport and Infrastructure (DPTI), South Australia collects data from various road crashes for further analysis in an endeavor to improve road safety. Over time, the data increases in size; the increase in the number of vehicles also contributes to huge amounts of data. As we look across multiple states, we can imagine a rather large set of data. Here, we want to employ various operations on the dataset using Spark to answer different queries.
This was an individual assignment where I scored the highest grade.
- Clone this repository into your system
git clone https://github.com/akale1994/Big-Data-Exploratory-Data-Analysis.git
- Make sure your Apache Spark clusters are running
- Open and run the notebook
Root directory > Assignment 1.ipynb