Phase-1-Project-Movies-Data-Analysis

Overview

In this project,we will be using Exploratory Data Analysis(EDA) to analyze five sets of datasets and use our findings to provide recommendations to 'Microsoft',a company trying to get into the film-making industry.

Note

I have mentioned in the project file that I had to copy the zippeddata folder into another folder outside the repository,extract the database,then make that folder my working repository.This had to be done because pushing the database after extraction was not possible because it exceeded GitHub's size limit.

Business Problem

Microsoft have decided to create a new movie studio,but they are new to the film-making business.We will be exploring what types of films are doing the best at box office then using those findings to provide insights to the movie studio and help decide what types of films to create.The data questions I plan to answer to solve this problem include:

1.What movies have been the highest-grossing each year?
2.What are the trends in box office revenue over time?
3.How are different movie genres performing?
4.How do gender preferences vary across demographic groups?
5.How do film ratings correlate with box office performance?
6.Who are the major competitors in the film-making industry,and what type of films do they produce?
7.What is happening with the highly popular streaming platforms?
In this project,we will be analyzing 5 datasets which contain various information that includes:

1.Movies
2.The movie genres
3.The movies ratings
4.The target audiences
5.Box office performance
These dataset names are as follows:
1.Box Office Mojo
2.IMDB
3.Rotten Tomatoes
4.The MovieDB
5.The Numbers

Methodology

The following analytical methods and techniques were used to analyze this data:
1.Pandas: A python library used for data manipulation.
2.Numpy: A python library used for numerical operations.
3.Matplotlib: A plotting library in python.
4.Seaborn: A data visualization library based on matplotlib.

Visualizations

In this project some of the visualizations used include:

Heatmaps: Show correlation between two variables. 2.Histograms: Show the frequency distribution of a numerical variable.

3.Bar Chart: Used to compare two or more categorical variables.

Recommendations.

From the results of data exploration and the visualizations that I've created,I would make the following recommendations to Microsoft:
1.Invest heavily in the production budget.The heat maps show that their is a positive correlation between production budget and domestic gross.
2.Focus on making documentaries and dramas.The histogram above shows that these two genres are the most popular ones.
3.Hire 'Omar Pasha' as your writer and director.He has contributed to the highest number of films in this dataset.
4.Avoid hiring 'Raphaelle Ayach' as your writer and director.He has contributed to the lowest number of films in this dataset.
5.make movies which are between 90 and 110 minutes long.The average runtime is 100 minutes. 6.Do not make musicals and fantasies.They are the least popular genres.
7.Target movie ratings of 6.5 or higher.6.5 is the average movie rating.

Commit History

Author: Frank [email protected] Date: Fri Mar 22 13:25:26 2024 +0300

Sub Commit

commit f0ef20095eb634a147558e84c04703db6a3b13c7 Author: Frank [email protected] Date: Thu Mar 21 17:05:34 2024 +0300

Add Presentation Slides

commit 6f73be4d58d30cb368fd1234a617bf16d17353ba Author: Frank [email protected] Date: Thu Mar 21 15:57:14 2024 +0300

Final Commit

commit 6544d832d64450e8f6cb663cb020025cc8d8cef8 Author: Frank [email protected] Date: Thu Mar 21 15:39:26 2024 +0300

Add Recommendations

commit ca766c58dc458e6f4c769e7778a490a395df2e44 Author: Frank [email protected] Date: Thu Mar 21 14:42:08 2024 +0300

Create Bar Chart

commit cf029b1a2b6f5c5b0a7454a1c1327e9b3465dce6 Author: Frank [email protected] Date: Thu Mar 21 13:49:38 2024 +0300

Create Histogram

commit 7682992453549ec79ad033126217e2752c9840c0 Author: Frank [email protected] Date: Thu Mar 21 12:29:41 2024 +0300

Create Heatmaps

commit 33cb37005a853a83688c7d61b68249925916312f Author: Frank [email protected] Date: Wed Mar 20 23:13:34 2024 +0300

Clean Empty Cells

commit 6d003082569e812cebe91e3874b677e7a650eded Author: Frank [email protected] Date: Wed Mar 20 15:42:43 2024 +0300

Remove Duplicates

commit a61b7f518567ca5ac09bc6bec6f7f3a924d65dac Author: Frank [email protected] Date: Tue Mar 19 23:03:04 2024 +0300

Perform Data Exploration

commit ff2bb872df05fe899b12e99d24f2ffdf33515b2c Author: Frank [email protected] Date: Tue Mar 19 20:33:18 2024 +0300

Open csv files

commit f60a828d649461c93fe71f9f4bd5e282a6ff7c99 Author: Frank [email protected] Date: Tue Mar 19 20:25:53 2024 +0300

Open Database

commit 71f55118987c3f887c890a012d78353638d5bb3a Author: Frank [email protected] Date: Tue Mar 19 20:17:42 2024 +0300

Import Libaries

commit 1d124abbd19305ad5112d8e7b78415be95a05231 Author: Frank [email protected] Date: Tue Mar 19 20:03:16 2024 +0300

Initial Commit

commit e5b23dd2c66ce2bcc6f613cb3541b978533c8e3d Author: FrankOyugi [email protected] Date: Tue Mar 19 19:55:22 2024 +0300

Initial commit

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.ipynb_checkpoints		.ipynb_checkpoints
zippedData		zippedData
PHASE 1 PROJECT PRESENTATION SLIDES.pdf		PHASE 1 PROJECT PRESENTATION SLIDES.pdf
README.md		README.md
im.db		im.db
student.ipynb		student.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phase-1-Project-Movies-Data-Analysis

Overview

Note

Business Problem

Methodology

Visualizations

Recommendations.

Commit History

About

Releases

Packages

Languages

FrankOyugi/Phase-1-Project-Movies-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Phase-1-Project-Movies-Data-Analysis

Overview

Note

Business Problem

Methodology

Visualizations

Recommendations.

Commit History

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages