This file is a culmination of multiple projects and assignments completed whilst undertaking my course 'Data Analysis with Python',
offered by the University of Pennsylvania. It is centred around analyzing movie data taken from IMDB.
The project features a multiplicity of data analysis methods and techniques, utilizing a variety of Python libraries, which were applied
in a series of different assignments required by the course. These include techniques for inspecting, querying, and analyzing Excel
files (parts 1 & 2), grouping and statistically summarizing data (part 3), and visualization data (part 4).
I included here an Excel file, 'imdb.xlsx.', with the data to be analyzed, as well as two versions of the project, the first is a Python script
with all the code and commentary, the second is Jupyter Notebook version which breaks down the code from the Python script into separate sections
and cells, each cell includes a single block of code, which performs a specific analytic task, and its resulting output, rendered and ready for viewing.
The data being analyzed here are movie data scraped from imdb.com. The dataset is comprised of a list of movies and a list of attributes
pertaining to each movie such as the release date, director of the movie, and the gross profit earned, as collected and arranged by IMDB.
As mentioned all the data are contained within the file “imdb.xlsx”, which consists of 3 sheets (with the following columns):
- imdb: 'movie_title', 'director_id', 'country_id', 'content_rating', 'title_year', 'imdb_score', 'gross', 'duration'
- countries: 'id', 'country'
- directors: 'id', 'director_name'
The aim of this project is to demonstrate my capacity for inspecting and analyzing data, filtering large chunks of data to extract
specific pieces of information or elucidate general trends in the dataset. As such, it showcases a variety of data analysis approaches,
techniques, and tricks.
You can quickly access the project from either of the links below. Both links will direct you to the Jupyter Notebook version of the project.
The first link however enables you to only view the code and the resulting output but not interact with it. The second link enables you to both
view the notebook and interact with it; it renders the code executable so that you can run the code and reproduce the same results yourself.
To view the project only, click on the link below:
https://nbviewer.org/github/Mo-Khalifa96/Movie-Data-Analysis/blob/7419c4b020ae59648a3ea5704ee73751b77572c9/Movie%20Data%20Analysis%20%28Jupyter%20version%29.ipynb
To view the project and interact with its code, click on the link below:
https://mybinder.org/v2/gh/Mo-Khalifa96/Movie-Data-Analysis/main?labpath=Movie%20Data%20Analysis%20(Jupyter%20version).ipynb