Skip to content

Enterprise supply chain optimization platform featuring advanced ML models (RF/XGBoost/Prophet), intelligent allocation algorithms, Streamlit dashboard, and automated monitoring. Built on Databricks with comprehensive feature engineering and agent-ready APIs.

License

Notifications You must be signed in to change notification settings

pratstick/supply-chain-predictive-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supply Chain Optimization with Databricks

This project demonstrates an enhanced supply chain optimization solution for distribution networks using Databricks, with 3 man- To create a supply chain agent, you can leverage these functions in any AI agent orchestrator

  • Enhanced with ML Models: The enhanced solution provides additional agent capabilities through advanced ML predictions and intelligent allocation recommendations
  • Test your Agent with the following questions:
    • What products are dependent on L6HUK material?
    • How much revenue is at risk if we can't produce the forecasted amount of product autoclave_1?
    • Which products have delays right now?
    • Are there any delays with syringe_1?
    • What raw materials are required for syringe_1?
    • Are there any shortages with one of the following raw materials: O4GRQ, Q5U3A, OAIFB or 58RJD?
    • What are the delays associated with wholesaler 9?
    • What's the ML model accuracy for product forecasting?
    • Show me the risk-adjusted allocation for high-priority products
    • Which products have the highest demand volatility?g plants delivering 30 product SKUs to 5 distribution centers, which in turn serve 30-60 wholesalers each. The solution leverages Databricks' distributed computing capabilities and modern ML techniques to answer questions such as: How much revenue is at risk if we can't produce the forecasted amount of product autoclave_1?

Enhanced with Advanced Analytics: This solution has been augmented with modern machine learning models, intelligent stock allocation, interactive web dashboards, and automated monitoring capabilities for enterprise-ready supply chain analytics.upply Chain Optimization with Databricks

This project demonstrates a supply chain optimization solution for distribution networks using Databricks, with 3 manufacturing plants delivering 30 product SKUs to 5 distribution centers, which in turn serve 30-60 wholesalers each. The solution leverages Databricks' distributed computing capabilities to answer questions such as: How much revenue is at risk if we can’t produce the forecasted amount of product autoclave_1?

Architecture

The solution follows this workflow:

  • Demand Forecasting: Generate one-week-ahead forecasts for each product/wholesaler combination using traditional Holt-Winters and advanced ML models (Random Forest, XGBoost, Prophet)
  • Demand Aggregation: Aggregate forecasts at distribution center level with enhanced feature engineering
  • Raw Material Planning: Convert finished product demand into raw material requirements using graph-based BOM analysis
  • Transportation Optimization: Minimize shipping costs between plants and distribution centers using linear programming
  • Intelligent Stock Allocation: Multi-constraint optimization with priority-based allocation logic and risk-adjusted demand planning
  • Interactive Dashboard: Real-time visualization and monitoring with Streamlit web interface
  • Automated Monitoring: Health checks, performance monitoring, and automated model retraining

Getting Started

  1. Set up a Databricks workspace
  • Ensure that you have permissions to write to a catalog and create and/or use a cluster
  1. Create and/or start your cluster
  • This solution has been tested on the following Databricks cluster configuration:
    • Cluster Type: Personal Compute
    • Access Mode: Dedicated (formerly: Single user)
    • Databricks Runtime Version: 16.3 ML (includes Apache Spark 3.5.2, Scala 2.12)
    • Node Type: i3.xlarge (30.5 GB Memory, 4 Cores)
  1. Import all notebooks into your workspace
  • In the workspace tab, under your user name follow these steps (recommended):
  • Create a Git folder
  • Add the Git repository URL
  • Alternatively, you can do right click and import the notebooks one by one, however it is recommended to add the Git repository URL instead
  1. Run 01_Introduction_And_Setup.py to initialize the environment and generate sample data
  • The notebooks use widgets for configuration:
    • catalog_name: Databricks catalog name (default: "main")
    • db_name: Database name (default: "supply_chain_db")
  1. Set up the enhanced environment (optional)
  • Copy .env.example to .env and update with your credentials
  • Run ./setup_environment.sh for automated setup
  • Or use ./deploy.sh for complete deployment including dashboard
  1. You have multiple options to run the solution
  • Option 1: Run all notebooks at once using magic commands
  • Option 2: Run each notebook manually in numerical order (recommended for learning)
  • Option 3: Run the complete enhanced pipeline using python run_analysis.py

Supply chain data

Supporting resource notebooks:

  • _resources/00-setup.py: Configuration setup
  • _resources/01-data-generator.py: Generate synthetic supply chain data
  • _resources/02-generate-supply.py: Generate supply data

These resource notebooks will create the following tables that we will leverage to build our solution:

  • product_demand_historical: Historical product demand by wholesaler
  • distribution_center_to_wholesaler_mapping: Mapping between distribution centers and wholesalers
  • bom: Bill of materials with material relationships
  • plant_supply: Maximum supply capacity by plant and product
  • transport_cost: Transportation costs between plants and distribution centers
  • list_prices: Price for each product

Enhanced tables created by the advanced solution:

  • demand_features_enhanced: Advanced feature engineered dataset with time-based, lag, and market features
  • ml_demand_predictions: Predictions from Random Forest, XGBoost, and Prophet models
  • intelligent_stock_allocation: Optimized stock allocation with multi-constraint optimization
  • model_performance_metrics: Performance tracking and comparison across models

Notebooks

The solution consists of multiple Databricks notebooks:

Core Solution

  1. 01_Introduction_And_Setup.py:
  • Project overview and data setup.
  • Make sure to run it first and to then run each notebook sequentially.
  1. 02_Fine_Grained_Demand_Forecasting.py:
  • Time series forecasting generating one-week-ahead SKU demand for every wholesaler and distribution center with a Holt-Winters seasonal model.
  • The output is a table product_demand_forecasted with aggregate forecasts at the distribution center level
  1. 03_Derive_Raw_Material_Demand.py:
  • In this notebook, we process product demand forecasts to determine raw material requirements using a graph-based approach (by transforming the BOM data into graph edges).
  • The outputs are two tables: raw_material_demand and raw_material_supply
  1. 04_Optimize_Transportation.py:
  • Linear programming to minimize transportation costs
  • The output is the table shipment_recommendations
  1. 05_Data_Analysis_&_Functions.py:
  • Additional analysis and utility functions: the notebook identifies critical materials with supply shortages by comparing demand vs. supply data & analyzes hierarchical relationships between materials and products
  • As output, you get the following custom SQL Functions:
    • product_from_raw: Maps a raw material to all downstream products
    • raw_from_product: Maps a product to all upstream raw materials
    • revenue_risk: Calculates potential revenue impact from raw material shortages
  1. 06_Vector_Search.py:
  • Generates supply-chain manager e-mails (unstructured data) and store the data in a vector index, enabling semantic queries that surface delay and risk signals
  1. 07_More_Functions.py:
  • Extended functionality and utilities
  • As output, you get the following custom SQL Functions:
    • lookup_product_demand: Retrieves historical demand data for specific products and wholesalers
    • query_unstructured_emails: Searches emails using vector search for relevant supply chain information
    • execute_code_sandbox: Enables dynamic code execution for custom analysis
    • _genie_query: Core function that interfaces with Databricks Genie API
    • ask_genie_pharma_gsc: Natural language interface to query the supply chain dataset
  • Make sure to customize the Genie integration:
    • Update the Databricks host URL and token
    • Specify your Genie Space ID

Advanced Analytics

  1. 08_Enhanced_Feature_Engineering.py:
  • Advanced feature engineering pipeline with 20+ sophisticated features
  • Time-based features (seasonality, trends, holidays), lag features, rolling statistics
  • Product lifecycle and market dynamics features
  • The output is an enhanced table demand_features_enhanced ready for ML models
  1. 09_Advanced_ML_Models.py:
  • Modern machine learning models for demand prediction including Random Forest, XGBoost, and Facebook Prophet
  • Ensemble methods combining multiple models for improved accuracy
  • Model performance comparison and validation with comprehensive metrics
  • The output is a table ml_demand_predictions with model predictions and confidence intervals
  1. 10_Intelligent_Stock_Allocation.py:
  • Multi-constraint optimization for intelligent stock allocation
  • Priority-based allocation logic with risk-adjusted demand planning
  • Performance analytics and KPIs for allocation effectiveness
  • The output is a table intelligent_stock_allocation with optimized allocation recommendations
  1. 11_Complete_Solution_Guide.py:
  • End-to-end implementation guide integrating all enhanced components
  • Solution summary with performance comparisons between traditional and enhanced approaches
  • Deployment instructions and monitoring setup guidance

Interactive Web Dashboard

The enhanced solution includes a comprehensive Streamlit web dashboard that provides real-time visualization and monitoring capabilities:

Dashboard Features

  • Executive Overview: Key performance metrics, demand trends, and regional performance heatmaps
  • Demand Forecasting: Model comparison, error analysis, feature importance, and future predictions
  • Stock Allocation: Optimization results, plant utilization, cost analysis, and detailed allocation plans
  • Performance Analytics: KPI dashboard with MAE, inventory turnover, service levels, and trend analysis
  • Risk Management: Risk alerts, supply chain resilience scoring, and volatility analysis

Accessing the Dashboard

  1. Quick Start: Run ./deploy.sh for automated deployment
  2. Manual Setup:
    cd dashboard
    pip install -r requirements.txt
    streamlit run streamlit_app.py
  3. Access: Open browser to http://localhost:8501

Dashboard Configuration

  • Update .env file with your Databricks credentials
  • Configure dashboard settings in dashboard/config.py
  • Customize data connections in dashboard/data_utils.py

Automation & Monitoring

The enhanced solution includes automated monitoring and maintenance capabilities:

Available Scripts

  • run_analysis.py: Complete pipeline orchestration with health checks and validation
  • monitor_system.py: System health monitoring with data freshness and performance validation
  • retrain_models.py: Automated model retraining based on performance degradation thresholds
  • setup_environment.sh: Environment setup and dependency installation
  • deploy.sh: Complete deployment script for the enhanced solution

Monitoring Features

  • Data Quality Checks: Freshness, completeness, and consistency validation
  • Model Performance Tracking: Accuracy monitoring and drift detection
  • Allocation Performance: Demand fulfillment and cost efficiency metrics
  • Automated Alerts: Email notifications for system issues and performance degradation

Creating Your Own Agent

  • To create a supply chain agent, you can leverage these functions in any AI agent orchestrator as shown in this demo video: https://www.youtube.com/watch?v=cz-x2B31Ga8

  • Test your Agent with the following questions:

    • What products are dependent on L6HUK material?
    • How much revenue is at risk if we can’t produce the forecasted amount of product autoclave_1?
    • Which products have delays right now?
    • Are there any delays with syringe_1?
    • What raw materials are required for syringe_1?
    • Are there any shortages with one of the following raw materials: O4GRQ, Q5U3A, OAIFB or 58RJD?
    • What are the delays associated with wholesaler 9?
  • You can also use SQL functions to query your supply chain:

    SELECT query_unstructured_emails(
      'What delivery delays are affecting Distribution Center 3?'
    );
  • 🆕 Enhanced Query Capabilities: Access ML predictions and intelligent allocation data:

    -- Query ML model predictions
    SELECT * FROM ml_demand_predictions 
    WHERE product_id = 'autoclave_1' AND prediction_date >= current_date();
    
    -- Check intelligent allocation results
    SELECT * FROM intelligent_stock_allocation 
    WHERE risk_level = 'High' AND allocation_priority >= 0.8;

Acknowledgments

About

Enterprise supply chain optimization platform featuring advanced ML models (RF/XGBoost/Prophet), intelligent allocation algorithms, Streamlit dashboard, and automated monitoring. Built on Databricks with comprehensive feature engineering and agent-ready APIs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published