Skip to content

The objective of this project is to build a data pipeline to show and analyse the results in PowerBI from the MovieLens 25M database, using Hive and Python.

Notifications You must be signed in to change notification settings

miguel617/MovieLens-Data-Engineer-Analytics-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MovieLens-Data-Engineer-Analytics-Project

The objective of this project is to build a data pipeline to show and analyse the results in PowerBI from the MovieLens 25M database, using Hive and Python.

The objectives with this project are:

  • Develop insights from a large dataset in Cloudera VM
  • Build and Optimize powerful HiveQL queries to be run on large CSV files
  • Enrich the basis MovieLen's Database with TMDB's database using API with python
  • Analyse and draw several conclusions about the Cinema Industry Environment

All the HiveQL queries can be seen in the "Data Ingestion and Queries" folder and the final PowerBI dashboard for visualization in the "PowerBI Dashboard" folder.

About

The objective of this project is to build a data pipeline to show and analyse the results in PowerBI from the MovieLens 25M database, using Hive and Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published