Skip to content

Mo-Khalifa96/Movie-Data-Analysis

Repository files navigation

Movie Data Analysis

About The Project

This file is a culmination of multiple projects and assignments completed whilst undertaking my course 'Data Analysis with Python', offered by the University of Pennsylvania. It is centred around analyzing movie data taken from IMDB.

The project features a multiplicity of data analysis methods and techniques, utilizing a variety of Python libraries, which were applied in a series of different assignments required by the course. These include techniques for inspecting, querying, and analyzing Excel files (parts 1 & 2), grouping and statistically summarizing data (part 3), and visualization data (part 4).

I included here an Excel file, 'imdb.xlsx.', with the data to be analyzed, as well as two versions of the project, the first is a Python script with all the code and commentary, the second is Jupyter Notebook version which breaks down the code from the Python script into separate sections and cells, each cell includes a single block of code, which performs a specific analytic task, and its resulting output, rendered and ready for viewing.

Database

The data being analyzed here are movie data scraped from imdb.com. The dataset is comprised of a list of movies and a list of attributes pertaining to each movie such as the release date, director of the movie, and the gross profit earned, as collected and arranged by IMDB.
As mentioned all the data are contained within the file “imdb.xlsx”, which consists of 3 sheets (with the following columns):

  • imdb: 'movie_title', 'director_id', 'country_id', 'content_rating', 'title_year', 'imdb_score', 'gross', 'duration'
  • countries: 'id', 'country'
  • directors: 'id', 'director_name'

Aim

The aim of this project is to demonstrate my capacity for inspecting and analyzing data, filtering large chunks of data to extract specific pieces of information or elucidate general trends in the dataset. As such, it showcases a variety of data analysis approaches, techniques, and tricks.

Quick Access

You can quickly access the project from either of the links below. Both links will direct you to the Jupyter Notebook version of the project. The first link however enables you to only view the code and the resulting output but not interact with it. The second link enables you to both view the notebook and interact with it; it renders the code executable so that you can run the code and reproduce the same results yourself.

To view the project only, click on the link below:
https://nbviewer.org/github/Mo-Khalifa96/Movie-Data-Analysis/blob/7419c4b020ae59648a3ea5704ee73751b77572c9/Movie%20Data%20Analysis%20%28Jupyter%20version%29.ipynb

To view the project and interact with its code, click on the link below:
https://mybinder.org/v2/gh/Mo-Khalifa96/Movie-Data-Analysis/main?labpath=Movie%20Data%20Analysis%20(Jupyter%20version).ipynb