Skip to content

osin-vladimir/udacity-data-engineer

Repository files navigation

Udacity Data Engineer Nanodegree Projects

You will need to define fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL.

You will need to model fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers CSV files from two local directories into Apache Cassandra tables using Python and SQL.

You will need to model fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers JSON files from two S3 into Amazon Redshift tables using Python, SQL and boto3.

In this project, you need to build an ETL pipeline for a data lake hosted on S3. You will need to load data from S3, process the data into analytics tables using Spark, and load them back into S3. Deployment of this Spark process on a cluster using AWS EMR is part of the job.

You will need to create your own custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step. You going to work with Airflow, S3 and Redshift cluster.

In this project I decided to build climate analysis dashboard based on this Kaggle dataset using Docker, Influx DB and Grafana.