Movies-ETL

Project Overview

The purpose of this project is to create an Extract, Transform, Load data pipeline for the multiple large movie-related datasets. First, the data is scraped from the web, then the data is cleaned, extracted and transformed to a much more user-friendly format and composition. Lastly, the datasets are loaded into an easily queryable database in SQL.

One example of the end-product of this ETL is the clean and concise dataframe displayed below showing the movie ratings. This dataframe was created using data from two different large datasets which were scraped from the open web and cleaned through the ETL process. Once transformed, the two datasets were merged to create the dataframe displayed here:

For the last step of the ETL process, the data was loaded into a SQL database, as shown here:

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Resources		Resources
.gitignore		.gitignore
ETL_clean_kaggle_data.ipynb		ETL_clean_kaggle_data.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
Movie_Data.ipynb		Movie_Data.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movies-ETL

Project Overview

About

Releases

Packages

Languages

mewers2/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Movies-ETL

Project Overview

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages