MovieLens-Data-Engineer-Analytics-Project

The objective of this project is to build a data pipeline to show and analyse the results in PowerBI from the MovieLens 25M database, using Hive and Python.

The objectives with this project are:

Develop insights from a large dataset in Cloudera VM
Build and Optimize powerful HiveQL queries to be run on large CSV files
Enrich the basis MovieLen's Database with TMDB's database using API with python
Analyse and draw several conclusions about the Cinema Industry Environment

All the HiveQL queries can be seen in the "Data Ingestion and Queries" folder and the final PowerBI dashboard for visualization in the "PowerBI Dashboard" folder.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data Ingestion and Queries		Data Ingestion and Queries
PowerBI Dashboard		PowerBI Dashboard
Presentation		Presentation
source		source
tmdb_script		tmdb_script
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MovieLens-Data-Engineer-Analytics-Project

About

Uh oh!

Releases

Packages

Languages

miguel617/MovieLens-Data-Engineer-Analytics-Project

Folders and files

Latest commit

History

Repository files navigation

MovieLens-Data-Engineer-Analytics-Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages