GitHub

Brief project description

The goal of this data pipeline is to transform data that is presented as JSON files to relational database, so the data analytics team is able to analyze ans query the data using SQL language. We have 2 datasets: song dataset and log dataset. Using these datasets we need to analyze the songs and the user activity and build Sparkify database. Both of these datasets contain JSON-files which we need to open and process their data and put it into the appropriate tables. The file create_tables.py creates all the tables and the file etl.py inserts the data into them.

HOW TO USE

Step 1: create tables. USAGE: python create_tables.py Step 2: load data in JSON format and insert it to our database. USAGE: python etl.py

DUPLICATIONS

Duplications in data are ignored, but only for users table the level will be updated

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brief project description

HOW TO USE

DUPLICATIONS

About

Uh oh!

Releases

Packages

Languages

Gekol/Udacity_PostgreSQL

Folders and files

Latest commit

History

Repository files navigation

Brief project description

HOW TO USE

DUPLICATIONS

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages