Skip to content

Part 3 of Udacity's Data Engineering With AWS Nano-Degree

License

Notifications You must be signed in to change notification settings

DecisioNaut/sparkling_lakes

Repository files navigation

Udacity's Data Engineering With AWS Nano-Degree - Project: "STEDI Human Balance Analytics"

This is a fictional project for lesson 3 of Udacity's Data Engineering with AWS nano-degree to be reviewed by Udacity.

Although I think both, the data for this project and the rubric, are problematic (see question 972495 and question 972946 in Udacity's Knowledge Center as well as my sense_check.ipynb) and - after more than two weeks - Udacity was not able to provide answers (let alone solutions), I herewith try my best to come up with something to finish this project.

The overall task is to use the data provided in the project started folder for

to build a toy Datalake with

  • Landing,
  • Trusted, and
  • Curated zones

using AWS services

  • S3,
  • Glue, and
  • Athena

and document the various steps by saving the Glue job scripts and Athena table definitions and also making some screenshots.

Please find the documentation here:

  1. Landing zone
  2. Trusted zone
  3. Curated zone

As the previous reviewer was somehow not able to see that customer data in the landing and trusted zone, differ by having sharewithresearchasofdata column with nulls and without nulls respectively, I added them more explicitely here:

About

Part 3 of Udacity's Data Engineering With AWS Nano-Degree

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published