Welcome to the Data Warehouse and Analytics Project repository! π
This project demonstrates a complete data warehousing solution, from raw data ingestion to generating actionable business insights. It is designed to showcase industry best practices in data engineering, data modeling, and analytics.
This repository provides a step-by-step approach to building a scalable and efficient data warehouse, covering:
- β ETL Pipelines (Extract, Transform, Load)
- β Data Modeling (Star Schema)
- β SQL-based Reporting & Analytics
- Data Architecture
- ETL Process
- Data Flow & Lineage
- Data Integration & Relationships
- Data Model: Star Schema
- Project Scope & Objectives
- Technology Stack & Tools
- Repository Structure
- Setup & Installation
- About Me
- License
The project follows the industry-standard Medallion Architecture, logically organizing data into three distinct layers.
- π₯ Bronze Layer (Raw Data): Stores raw, unaltered data ingested directly from the source CSV files into SQL Server.
- π₯ Silver Layer (Cleansed & Transformed Data): This layer holds cleansed, standardized, and integrated data prepared for analysis.
- π₯ Gold Layer (Business-Ready Data): The final presentation layer, optimized for analytics and reporting using a star schema.

β‘οΈ For a complete breakdown, see the Detailed Data Architecture Documentation).
The data is moved and transformed between layers using an ETL (Extract, Transform, Load) process managed by stored procedures. The process includes sophisticated techniques for data cleansing, standardization, and applying business logic.
β‘οΈ For a complete breakdown, see the Detailed ETL Process Documentation.
The data lineage diagram below shows how data flows from the source systems, through the Bronze and Silver layers, and is finally integrated into the Gold layer's star schema.
β‘οΈ For more details, see the Data Flow & Lineage Documentation.
The data integration diagram below illustrates how tables from the CRM and ERP source systems are related. It details the key relationships used to join disparate tables and create a unified, 360-degree view of customers and products.
β‘οΈ For more details, see the Data Integration Documentation.
The Gold Layer is modeled as a Sales Data Mart using a Star Schema. This model is optimized for high-performance analytics and consists of a central fact table surrounded by descriptive dimension tables.
- Fact Table:
gold.fact_sales
- Dimension Tables:
gold.dim_customers
,gold.dim_products
β‘οΈ For column-level details, see the Gold Layer Data Catalog.
This project is designed to showcase expertise in the following areas:
- SQL Development
- Data Engineering & ETL Pipelines
- Data Architecture & Modeling
- Data Analytics & Reporting
The primary objective is to develop a modern data warehouse using SQL Server to consolidate sales data from disparate sources.
- Data Sources: Import and integrate data from ERP & CRM (CSV files).
- Data Quality: Cleanse data and resolve quality issues before analysis.
- Data Modeling: Combine sources into a single, user-friendly star schema.
- Documentation: Provide clear documentation for the data model and architecture.
The goal is to develop SQL-based analytics to deliver detailed insights into key business metrics.
- Customer Behavior Analysis: Understand purchasing patterns.
- Product Performance Metrics: Identify top-performing products and categories.
- Sales Trend Analysis: Track revenue and sales patterns over time.
- Database: SQL Server
- ETL Processing: Transact-SQL (T-SQL)
- Data Modeling & Visualization: Draw.io
- Project Management: Notion
- Version Control: Git & GitHub
The project repository is organized into the following key directories, each dedicated to a specific stage of the data warehouse lifecycle, from raw data ingestion to final analysis.
analytical-report/
βββ 01_gold-layer-dataset/
β βββ gold.dim_customers.csv
β βββ gold.dim_products.csv
β βββ gold.fact_sales.csv
βββ 02_exploratory-data-analysis/
β βββ 00_init_database.sql
β βββ 01_database_exploration.sql
β βββ 02_dimensions_exploration.sql
β βββ 03_date_range_exploration.sql
β βββ 04_measures_exploration.sql
β βββ 05_magnitude_analysis.sql
β βββ 06_ranking_analysis.sql
β βββ README.md
βββ 03_advanced-analytics/
β βββ 07_change_over_time_analysis.sql
β βββ 08_cumulative_analysis.sql
β βββ 09_performance_analysis.sql
β βββ 10_data_segmentation.sql
β βββ 11_part_to_whole_analysis.sql
β βββ README.md
βββ 04_report-generation/
βββ 12_report_customers.sql
βββ 13_report_products.sql
βββ 14_report_analysis_queries.sql
βββ README.md
datasets/
βββ source_crm/
β βββ cust_info.csv
β βββ prd_info.csv
β βββ sales_details.csv
βββ source_erp/
βββ CUST_AZ12.csv
βββ LOC_A101.csv
βββ PX_CAT_G1V2.csv
docs/
βββ data-architecture/
β βββ data_architecture.png
β βββ data_architecture.svg
β βββ README.md
βββ data-flow/
β βββ data_flow.png
β βββ data_flow.svg
β βββ README.md
βββ data-integration/
β βββ data_integration.png
β βββ data_integration.svg
β βββ README.md
βββ data-model/
β βββ data_model.png
β βββ data_model.svg
β βββ README.md
βββ etl/
β βββ extraction/
β β βββ exactration.png
β β βββ exactration.svg
β β βββ README.md
β βββ load/
β β βββ load.png
β β βββ load.svg
β β βββ README.md
β βββ transformation/
β β βββ README.md
β β βββ transformation.png
β β βββ transformation.svg
β βββ etl_animation_1.svg
β βββ etl_animation.svg
β βββ etl_pic_1.png
β βββ etl_pic.png
β βββ README.md
βββ warehousing-data-catalog/
β βββ README.md
βββ warehousing-naming-convention/
β βββ README.md
βββ warehousing-tables-views-details/
βββ bronze_layer_tables_views_details.csv
βββ bronze_layer_tables_views_details.xlsx
βββ gold_layer_tables_views_details.csv
βββ gold_layer_tables_views_details.xlsx
βββ silver_layer_tables_views_details.csv
βββ silver_layer_tables_views_details.xlsx
scripts/
βββ bronze/
β βββ ddl_bronze.sql
β βββ proc_load_bronze.sql
β βββ README.md
βββ gold/
β βββ structured-csv-data/
β β βββ dim_customers.csv
β β βββ dim_products.csv
β β βββ fact_sales.csv
β βββ ddl_gold.sql
β βββ README.md
βββ silver/
β βββ ddl_silver.sql
β βββ proc_load_silver.sql
β βββ README.md
βββ init_database.sql
βββ placeholder
tests/
βββ placeholder
βββ quality_checks_bronze.sql
βββ quality_checks_gold.sql
βββ quality_checks_silver.sql
LICENSE
README.md
To deploy and run this project, follow these steps:
- Install SQL Server -> Download Link
- Install SQL Server Management Studio (SSMS) -> Download Link
- Clone this repository:
git clone https://github.com/apurva313/sql-data-warehouse-analytics-project.git
- Initialize Database: In SSMS, run the DDL scripts from the
/ddl/
folder in the following order to create the warehouse structure:ddl_bronze.sql
ddl_silver.sql
ddl_gold.sql
- Load Raw Data: Use SSMS Import/Export Wizard or BULK INSERT to load the source CSV data into the Bronze layer tables.
- Run ETL Scripts: Execute the stored procedures in the
/sp/
folder to populate the Silver layer.proc_load_silver.sql
- Start Analysis: The Gold layer views are now ready! You can query them directly in SSMS or connect a BI tool for reporting.
This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.