Skip to content

SenthilArun8/time-series

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retail Sales Time Series Analysis

A complete time series analysis pipeline built with Python - from messy data to business insights.

Portfolio project demonstrating data cleaning, forecasting, analysis, and business storytelling for data analyst/scientist positions.


What This Project Does

Takes messy retail sales data spanning 3 years and transforms it into:

  • Demonstrates data science workflow
  • Shows data cleaning and validation skills
  • Compares multiple forecasting approaches
  • Extracts actionable business insights
  • Creates visualizations

Run This

# Install dependencies
pip install -r requirements.txt

# Generate sample data
python generate_data.py

# Run complete analysis (3-5 minutes)
python run_all.py

Results from This Pipeline

Dataset Analyzed

  • 1,095 daily sales records (2021-2023)
  • $1.38 million total revenue
  • 69,927 customers
  • 3 store locations, 4 product categories

Data Quality Improvements

  • 22 missing values → Interpolated using time-series methods
  • 3 duplicate records → Removed
  • 5 outliers → Capped using IQR method
  • Inconsistent store names → Standardized

Forecasting Performance

Model MAE ($) RMSE ($) MAPE (%)
Moving Average 187.31 214.69 14.28
ARIMA 196.18 223.36 14.97

Moving Average performed best for this dataset (lower error = better)

Business Insights Discovered

Growth:

  • 14.8% YoY growth (2022)
  • 14.4% YoY growth (2023)
  • Consistent upward trend

Store Performance:

  • Store_A: $537,027 (38.8% of sales)
  • Store_B: $453,228 (32.8%)
  • Store_C: $393,636 (28.4%)

Product Categories:

  • Electronics: 30.7% of total sales
  • Clothing: 26.5%
  • Home & Garden: 23.8%
  • Food & Beverage: 19.0%

Temporal Patterns:

  • Best day: Sunday ($1,349 average)
  • Best month: April (highest average sales)
  • Weekend sales significantly higher than weekdays

Project Structure

time-series/
├── scripts/
│   ├── 01_data_cleaning.py
│   ├── 02_modeling_forecasting.py
│   ├── 03_analysis_insights.py
│   └── 04_executive_summary.py
│
├── data/                 # Generated outputs
│   ├── cleaned_sales_data.csv
│   ├── model_comparison.csv
│   ├── key_insights.txt
│   └── *.png            # 15+ visualizations
│
├── run_all.py
├── generate_data.py
└── requirements.txt

What Each Script Does

Cleans messy retail data:

  • Identifies and fixes missing values (2% of data)
  • Removes duplicate dates
  • Standardizes store names
  • Handles outliers using statistical methods
  • Validates data quality

Outputs: cleaned_sales_data.csv + 2 visualizations


Builds forecasting models:

  • Decomposes time series (trend, seasonality, residuals)
  • Tests for stationarity (Augmented Dickey-Fuller)
  • Trains Moving Average and ARIMA models
  • Compares performance using MAE, RMSE, MAPE

Outputs: model_comparison.csv + 4 visualizations


Extracts business insights:

  • Calculates KPIs (revenue, transactions, customer metrics)
  • Compares store performance across 3 locations
  • Analyzes 4 product categories
  • Identifies weekly and monthly patterns
  • Performs statistical tests (t-tests for significance)

Outputs: key_insights.txt + 7 visualizations


Creates executive report:

  • Business KPI dashboard
  • Key findings summary
  • Model performance visualization
  • Strategic recommendations

Outputs: executive_summary_metrics.csv + 3 visualizations


Sample Visualizations

Executive Dashboard

Executive Dashboard

Time Series Decomposition

Time Series Decomposition

Model Forecast Comparison

Model Forecasts

Store Performance

Store Performance

Category Distribution

Category Distribution

Temporal Patterns

Temporal Patterns

Monthly Sales Trend

Monthly Sales Trend

Correlation Matrix

Correlation Matrix

Data Cleaning: Before vs After

Raw Data Distribution Cleaned Data Time Series


Skills Demonstrated

Technical Skills

  • Python: pandas, numpy, matplotlib, seaborn, scipy
  • Time Series: statsmodels, ARIMA, decomposition
  • Statistics: hypothesis testing, correlation analysis, outlier detection
  • Data Cleaning: handling missing values, duplicates, outliers
  • Data Visualization: 15+ different chart types

Data Science Workflow

Raw Data → Clean → Explore → Model → Analyze → Report
  1. Data quality assessment
  2. Systematic cleaning and validation
  3. Statistical analysis
  4. Predictive modeling
  5. Business insight extraction
  6. Executive communication

To-do Future for this Project

  • Interactive dashboard (Streamlit)
  • Additional models (XGBoost, LSTM)
  • A/B testing framework
  • Anomaly detection
  • REST API for predictions

Created: January 2026 Last Updated: January 2026

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages