Airflow Data Pipelines

Forth project provided by Udacity in the "Data Engineering with AWS" course, the main purpose of which is to practice the core concepts of Apache Airflow.

Below is provided an example DAG for the project:

The template provided contains a set of tasks that need to be linked to achieve a coherent and sensible data flow within the pipeline, as well as four empty operators that need to be implemented into functional pieces of a data pipeline.

Locations for the datasets for the project:

Log data: s3://udacity-dend/log_data
Song data: s3://udacity-dend/song-data

Copy the data from the udacity bucket to the home cloudshell directory:

aws s3 cp s3://udacity-dend/log-data/ ~/log-data/ --recursive
aws s3 cp s3://udacity-dend/song-data/ ~/song-data/ --recursive
aws s3 cp s3://udacity-dend/log_json_path.json ~/

Copy the data from the home cloudshell directory to your own bucket -- this is only an example:

aws s3 cp ~/log-data/ s3://kgolovko-data-pipelines/log-data/ --recursive
aws s3 cp ~/song-data/ s3://kgolovko-data-pipelines/song-data/ --recursive
aws s3 cp ~/log_json_path.json s3://kgolovko-data-pipelines/

List the data in your own bucket to be sure it copied over:

aws s3 ls s3://kgolovko-data-pipelines/log-data/

To check issues during the loading process in Redshift, use the below command in Redshift editor (connect using the user and password during the Redshift creation):

SELECT * 
FROM SYS_LOAD_ERROR_DETAIL
ORDER BY start_time DESC
LIMIT 10;

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
final_project_operators		final_project_operators
udacity/common		udacity/common
README.md		README.md
dag_after.png		dag_after.png
dag_before.png		dag_before.png
dag_in_the_process.png		dag_in_the_process.png
final_dag.py		final_dag.py
set_connections_and_variables.sh		set_connections_and_variables.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Airflow Data Pipelines

About

Uh oh!

Releases

Packages

Languages

kateryna-golovko/airflow-data-pipelines

Folders and files

Latest commit

History

Repository files navigation

Airflow Data Pipelines

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages