In this project,we will be using Exploratory Data Analysis(EDA) to analyze five sets of datasets and use our findings to provide recommendations to 'Microsoft',a company trying to get into the film-making industry.
I have mentioned in the project file that I had to copy the zippeddata folder into another folder outside the repository,extract the database,then make that folder my working repository.This had to be done because pushing the database after extraction was not possible because it exceeded GitHub's size limit.
Microsoft have decided to create a new movie studio,but they are new to the film-making business.We will be exploring what types of films are doing the best at box office then using those findings to provide insights to the movie studio and help decide what types of films to create.The data questions I plan to answer to solve this problem include:
1.What movies have been the highest-grossing each year?
2.What are the trends in box office revenue over time?
3.How are different movie genres performing?
4.How do gender preferences vary across demographic groups?
5.How do film ratings correlate with box office performance?
6.Who are the major competitors in the film-making industry,and what type of films do they produce?
7.What is happening with the highly popular streaming platforms?
In this project,we will be analyzing 5 datasets which contain various information that includes:
1.Movies
2.The movie genres
3.The movies ratings
4.The target audiences
5.Box office performance
These dataset names are as follows:
1.Box Office Mojo
2.IMDB
3.Rotten Tomatoes
4.The MovieDB
5.The Numbers
The following analytical methods and techniques were used to analyze this data:
1.Pandas: A python library used for data manipulation.
2.Numpy: A python library used for numerical operations.
3.Matplotlib: A plotting library in python.
4.Seaborn: A data visualization library based on matplotlib.
In this project some of the visualizations used include:
- Heatmaps: Show correlation between two variables. 2.Histograms: Show the frequency distribution of a numerical variable.
3.Bar Chart: Used to compare two or more categorical variables.
From the results of data exploration and the visualizations that I've created,I would make the following recommendations to Microsoft:
1.Invest heavily in the production budget.The heat maps show that their is a positive correlation between production budget and domestic gross.
2.Focus on making documentaries and dramas.The histogram above shows that these two genres are the most popular ones.
3.Hire 'Omar Pasha' as your writer and director.He has contributed to the highest number of films in this dataset.
4.Avoid hiring 'Raphaelle Ayach' as your writer and director.He has contributed to the lowest number of films in this dataset.
5.make movies which are between 90 and 110 minutes long.The average runtime is 100 minutes.
6.Do not make musicals and fantasies.They are the least popular genres.
7.Target movie ratings of 6.5 or higher.6.5 is the average movie rating.
Author: Frank [email protected] Date: Fri Mar 22 13:25:26 2024 +0300
Sub Commit
commit f0ef20095eb634a147558e84c04703db6a3b13c7 Author: Frank [email protected] Date: Thu Mar 21 17:05:34 2024 +0300
Add Presentation Slides
commit 6f73be4d58d30cb368fd1234a617bf16d17353ba Author: Frank [email protected] Date: Thu Mar 21 15:57:14 2024 +0300
Final Commit
commit 6544d832d64450e8f6cb663cb020025cc8d8cef8 Author: Frank [email protected] Date: Thu Mar 21 15:39:26 2024 +0300
Add Recommendations
commit ca766c58dc458e6f4c769e7778a490a395df2e44 Author: Frank [email protected] Date: Thu Mar 21 14:42:08 2024 +0300
Create Bar Chart
commit cf029b1a2b6f5c5b0a7454a1c1327e9b3465dce6 Author: Frank [email protected] Date: Thu Mar 21 13:49:38 2024 +0300
Create Histogram
commit 7682992453549ec79ad033126217e2752c9840c0 Author: Frank [email protected] Date: Thu Mar 21 12:29:41 2024 +0300
Create Heatmaps
commit 33cb37005a853a83688c7d61b68249925916312f Author: Frank [email protected] Date: Wed Mar 20 23:13:34 2024 +0300
Clean Empty Cells
commit 6d003082569e812cebe91e3874b677e7a650eded Author: Frank [email protected] Date: Wed Mar 20 15:42:43 2024 +0300
Remove Duplicates
commit a61b7f518567ca5ac09bc6bec6f7f3a924d65dac Author: Frank [email protected] Date: Tue Mar 19 23:03:04 2024 +0300
Perform Data Exploration
commit ff2bb872df05fe899b12e99d24f2ffdf33515b2c Author: Frank [email protected] Date: Tue Mar 19 20:33:18 2024 +0300
Open csv files
commit f60a828d649461c93fe71f9f4bd5e282a6ff7c99 Author: Frank [email protected] Date: Tue Mar 19 20:25:53 2024 +0300
Open Database
commit 71f55118987c3f887c890a012d78353638d5bb3a Author: Frank [email protected] Date: Tue Mar 19 20:17:42 2024 +0300
Import Libaries
commit 1d124abbd19305ad5112d8e7b78415be95a05231 Author: Frank [email protected] Date: Tue Mar 19 20:03:16 2024 +0300
Initial Commit
commit e5b23dd2c66ce2bcc6f613cb3541b978533c8e3d Author: FrankOyugi [email protected] Date: Tue Mar 19 19:55:22 2024 +0300
Initial commit