Skip to content

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

License

Notifications You must be signed in to change notification settings

apurva313/sql-data-warehouse-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

81 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

End-to-End SQL Data Warehouse & Analytics Project

Welcome to the Data Warehouse and Analytics Project repository! πŸš€

This project demonstrates a complete data warehousing solution, from raw data ingestion to generating actionable business insights. It is designed to showcase industry best practices in data engineering, data modeling, and analytics.

This repository provides a step-by-step approach to building a scalable and efficient data warehouse, covering:

  • βœ… ETL Pipelines (Extract, Transform, Load)
  • βœ… Data Modeling (Star Schema)
  • βœ… SQL-based Reporting & Analytics

Table of Contents

  1. Data Architecture
  2. ETL Process
  3. Data Flow & Lineage
  4. Data Integration & Relationships
  5. Data Model: Star Schema
  6. Project Scope & Objectives
  7. Technology Stack & Tools
  8. Repository Structure
  9. Setup & Installation
  10. About Me
  11. License

πŸ—οΈ Data Architecture

The project follows the industry-standard Medallion Architecture, logically organizing data into three distinct layers.

  • πŸ₯‰ Bronze Layer (Raw Data): Stores raw, unaltered data ingested directly from the source CSV files into SQL Server.
  • πŸ₯ˆ Silver Layer (Cleansed & Transformed Data): This layer holds cleansed, standardized, and integrated data prepared for analysis.
  • πŸ₯‡ Gold Layer (Business-Ready Data): The final presentation layer, optimized for analytics and reporting using a star schema.
data_architecture

➑️ For a complete breakdown, see the Detailed Data Architecture Documentation).


βš™οΈ ETL Process

The data is moved and transformed between layers using an ETL (Extract, Transform, Load) process managed by stored procedures. The process includes sophisticated techniques for data cleansing, standardization, and applying business logic.

etl_animation

➑️ For a complete breakdown, see the Detailed ETL Process Documentation.


πŸ—ΊοΈ Data Flow & Lineage

The data lineage diagram below shows how data flows from the source systems, through the Bronze and Silver layers, and is finally integrated into the Gold layer's star schema.

data_flow

➑️ For more details, see the Data Flow & Lineage Documentation.


πŸ”— Data Integration & Relationships

The data integration diagram below illustrates how tables from the CRM and ERP source systems are related. It details the key relationships used to join disparate tables and create a unified, 360-degree view of customers and products.

data_integration

➑️ For more details, see the Data Integration Documentation.


⭐ Data Model: Star Schema

The Gold Layer is modeled as a Sales Data Mart using a Star Schema. This model is optimized for high-performance analytics and consists of a central fact table surrounded by descriptive dimension tables.

  • Fact Table: gold.fact_sales
  • Dimension Tables: gold.dim_customers, gold.dim_products

data_model

➑️ For column-level details, see the Gold Layer Data Catalog.


🎯 Project Scope & Objectives

This project is designed to showcase expertise in the following areas:

  • SQL Development
  • Data Engineering & ETL Pipelines
  • Data Architecture & Modeling
  • Data Analytics & Reporting

Data Engineering: Building the Warehouse

The primary objective is to develop a modern data warehouse using SQL Server to consolidate sales data from disparate sources.

  • Data Sources: Import and integrate data from ERP & CRM (CSV files).
  • Data Quality: Cleanse data and resolve quality issues before analysis.
  • Data Modeling: Combine sources into a single, user-friendly star schema.
  • Documentation: Provide clear documentation for the data model and architecture.

BI: Analytics & Reporting

The goal is to develop SQL-based analytics to deliver detailed insights into key business metrics.

  • Customer Behavior Analysis: Understand purchasing patterns.
  • Product Performance Metrics: Identify top-performing products and categories.
  • Sales Trend Analysis: Track revenue and sales patterns over time.

πŸ› οΈ Technology Stack & Tools

  • Database: SQL Server
  • ETL Processing: Transact-SQL (T-SQL)
  • Data Modeling & Visualization: Draw.io
  • Project Management: Notion
  • Version Control: Git & GitHub

πŸ“‚ Repository Structure

The project repository is organized into the following key directories, each dedicated to a specific stage of the data warehouse lifecycle, from raw data ingestion to final analysis.

analytical-report/
    β”œβ”€β”€ 01_gold-layer-dataset/
    β”‚   β”œβ”€β”€ gold.dim_customers.csv
    β”‚   β”œβ”€β”€ gold.dim_products.csv
    β”‚   └── gold.fact_sales.csv
    β”œβ”€β”€ 02_exploratory-data-analysis/
    β”‚   β”œβ”€β”€ 00_init_database.sql
    β”‚   β”œβ”€β”€ 01_database_exploration.sql
    β”‚   β”œβ”€β”€ 02_dimensions_exploration.sql
    β”‚   β”œβ”€β”€ 03_date_range_exploration.sql
    β”‚   β”œβ”€β”€ 04_measures_exploration.sql
    β”‚   β”œβ”€β”€ 05_magnitude_analysis.sql
    β”‚   β”œβ”€β”€ 06_ranking_analysis.sql
    β”‚   └── README.md
    β”œβ”€β”€ 03_advanced-analytics/
    β”‚   β”œβ”€β”€ 07_change_over_time_analysis.sql
    β”‚   β”œβ”€β”€ 08_cumulative_analysis.sql
    β”‚   β”œβ”€β”€ 09_performance_analysis.sql
    β”‚   β”œβ”€β”€ 10_data_segmentation.sql
    β”‚   β”œβ”€β”€ 11_part_to_whole_analysis.sql
    β”‚   └── README.md
    └── 04_report-generation/
        β”œβ”€β”€ 12_report_customers.sql
        β”œβ”€β”€ 13_report_products.sql
        β”œβ”€β”€ 14_report_analysis_queries.sql
        └── README.md
datasets/
    β”œβ”€β”€ source_crm/
    β”‚   β”œβ”€β”€ cust_info.csv
    β”‚   β”œβ”€β”€ prd_info.csv
    β”‚   └── sales_details.csv
    └── source_erp/
        β”œβ”€β”€ CUST_AZ12.csv
        β”œβ”€β”€ LOC_A101.csv
        └── PX_CAT_G1V2.csv
docs/
    β”œβ”€β”€ data-architecture/
    β”‚   β”œβ”€β”€ data_architecture.png
    β”‚   β”œβ”€β”€ data_architecture.svg
    β”‚   └── README.md
    β”œβ”€β”€ data-flow/
    β”‚   β”œβ”€β”€ data_flow.png
    β”‚   β”œβ”€β”€ data_flow.svg
    β”‚   └── README.md
    β”œβ”€β”€ data-integration/
    β”‚   β”œβ”€β”€ data_integration.png
    β”‚   β”œβ”€β”€ data_integration.svg
    β”‚   └── README.md
    β”œβ”€β”€ data-model/
    β”‚   β”œβ”€β”€ data_model.png
    β”‚   β”œβ”€β”€ data_model.svg
    β”‚   └── README.md
    β”œβ”€β”€ etl/
    β”‚   β”œβ”€β”€ extraction/
    β”‚   β”‚   β”œβ”€β”€ exactration.png
    β”‚   β”‚   β”œβ”€β”€ exactration.svg
    β”‚   β”‚   └── README.md
    β”‚   β”œβ”€β”€ load/
    β”‚   β”‚   β”œβ”€β”€ load.png
    β”‚   β”‚   β”œβ”€β”€ load.svg
    β”‚   β”‚   └── README.md
    β”‚   β”œβ”€β”€ transformation/
    β”‚   β”‚   β”œβ”€β”€ README.md
    β”‚   β”‚   β”œβ”€β”€ transformation.png
    β”‚   β”‚   └── transformation.svg
    β”‚   β”œβ”€β”€ etl_animation_1.svg
    β”‚   β”œβ”€β”€ etl_animation.svg
    β”‚   β”œβ”€β”€ etl_pic_1.png
    β”‚   β”œβ”€β”€ etl_pic.png
    β”‚   └── README.md
    β”œβ”€β”€ warehousing-data-catalog/
    β”‚   └── README.md
    β”œβ”€β”€ warehousing-naming-convention/
    β”‚   └── README.md
    └── warehousing-tables-views-details/
        β”œβ”€β”€ bronze_layer_tables_views_details.csv
        β”œβ”€β”€ bronze_layer_tables_views_details.xlsx
        β”œβ”€β”€ gold_layer_tables_views_details.csv
        β”œβ”€β”€ gold_layer_tables_views_details.xlsx
        β”œβ”€β”€ silver_layer_tables_views_details.csv
        └── silver_layer_tables_views_details.xlsx
scripts/
    β”œβ”€β”€ bronze/
    β”‚   β”œβ”€β”€ ddl_bronze.sql
    β”‚   β”œβ”€β”€ proc_load_bronze.sql
    β”‚   └── README.md
    β”œβ”€β”€ gold/
    β”‚   β”œβ”€β”€ structured-csv-data/
    β”‚   β”‚   β”œβ”€β”€ dim_customers.csv
    β”‚   β”‚   β”œβ”€β”€ dim_products.csv
    β”‚   β”‚   └── fact_sales.csv
    β”‚   β”œβ”€β”€ ddl_gold.sql
    β”‚   └── README.md
    β”œβ”€β”€ silver/
    β”‚   β”œβ”€β”€ ddl_silver.sql
    β”‚   β”œβ”€β”€ proc_load_silver.sql
    β”‚   └── README.md
    β”œβ”€β”€ init_database.sql
    └── placeholder
tests/
    β”œβ”€β”€ placeholder
    β”œβ”€β”€ quality_checks_bronze.sql
    β”œβ”€β”€ quality_checks_gold.sql
    └── quality_checks_silver.sql
LICENSE
README.md

πŸš€ Setup & Installation

To deploy and run this project, follow these steps:

Prerequisites:

  • Install SQL Server -> Download Link
  • Install SQL Server Management Studio (SSMS) -> Download Link
  • Clone this repository:
    git clone https://github.com/apurva313/sql-data-warehouse-analytics-project.git

Running the Scripts:

  1. Initialize Database: In SSMS, run the DDL scripts from the /ddl/ folder in the following order to create the warehouse structure:
    • ddl_bronze.sql
    • ddl_silver.sql
    • ddl_gold.sql
  2. Load Raw Data: Use SSMS Import/Export Wizard or BULK INSERT to load the source CSV data into the Bronze layer tables.
  3. Run ETL Scripts: Execute the stored procedures in the /sp/ folder to populate the Silver layer.
    • proc_load_silver.sql
  4. Start Analysis: The Gold layer views are now ready! You can query them directly in SSMS or connect a BI tool for reporting.

πŸš€ About Me

linkedin github


πŸ›‘οΈ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

About

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages