Supply Chain Optimization with Databricks

This project demonstrates an enhanced supply chain optimization solution for distribution networks using Databricks, with 3 man- To create a supply chain agent, you can leverage these functions in any AI agent orchestrator

Enhanced with ML Models: The enhanced solution provides additional agent capabilities through advanced ML predictions and intelligent allocation recommendations
Test your Agent with the following questions:
- What products are dependent on L6HUK material?
- How much revenue is at risk if we can't produce the forecasted amount of product autoclave_1?
- Which products have delays right now?
- Are there any delays with syringe_1?
- What raw materials are required for syringe_1?
- Are there any shortages with one of the following raw materials: O4GRQ, Q5U3A, OAIFB or 58RJD?
- What are the delays associated with wholesaler 9?
- What's the ML model accuracy for product forecasting?
- Show me the risk-adjusted allocation for high-priority products
- Which products have the highest demand volatility?g plants delivering 30 product SKUs to 5 distribution centers, which in turn serve 30-60 wholesalers each. The solution leverages Databricks' distributed computing capabilities and modern ML techniques to answer questions such as: How much revenue is at risk if we can't produce the forecasted amount of product autoclave_1?

Enhanced with Advanced Analytics: This solution has been augmented with modern machine learning models, intelligent stock allocation, interactive web dashboards, and automated monitoring capabilities for enterprise-ready supply chain analytics.upply Chain Optimization with Databricks

This project demonstrates a supply chain optimization solution for distribution networks using Databricks, with 3 manufacturing plants delivering 30 product SKUs to 5 distribution centers, which in turn serve 30-60 wholesalers each. The solution leverages Databricks' distributed computing capabilities to answer questions such as: How much revenue is at risk if we can’t produce the forecasted amount of product autoclave_1?

Architecture

The solution follows this workflow:

Demand Forecasting: Generate one-week-ahead forecasts for each product/wholesaler combination using traditional Holt-Winters and advanced ML models (Random Forest, XGBoost, Prophet)
Demand Aggregation: Aggregate forecasts at distribution center level with enhanced feature engineering
Raw Material Planning: Convert finished product demand into raw material requirements using graph-based BOM analysis
Transportation Optimization: Minimize shipping costs between plants and distribution centers using linear programming
Intelligent Stock Allocation: Multi-constraint optimization with priority-based allocation logic and risk-adjusted demand planning
Interactive Dashboard: Real-time visualization and monitoring with Streamlit web interface
Automated Monitoring: Health checks, performance monitoring, and automated model retraining

Getting Started

Set up a Databricks workspace

Ensure that you have permissions to write to a catalog and create and/or use a cluster

Create and/or start your cluster

This solution has been tested on the following Databricks cluster configuration:
- Cluster Type: Personal Compute
- Access Mode: Dedicated (formerly: Single user)
- Databricks Runtime Version: 16.3 ML (includes Apache Spark 3.5.2, Scala 2.12)
- Node Type: i3.xlarge (30.5 GB Memory, 4 Cores)

Import all notebooks into your workspace

In the workspace tab, under your user name follow these steps (recommended):
Create a Git folder
Add the Git repository URL
Alternatively, you can do right click and import the notebooks one by one, however it is recommended to add the Git repository URL instead

Run 01_Introduction_And_Setup.py to initialize the environment and generate sample data

The notebooks use widgets for configuration:
- catalog_name: Databricks catalog name (default: "main")
- db_name: Database name (default: "supply_chain_db")

Set up the enhanced environment (optional)

Copy .env.example to .env and update with your credentials
Run ./setup_environment.sh for automated setup
Or use ./deploy.sh for complete deployment including dashboard

You have multiple options to run the solution

Option 1: Run all notebooks at once using magic commands
Option 2: Run each notebook manually in numerical order (recommended for learning)
Option 3: Run the complete enhanced pipeline using python run_analysis.py

Supply chain data

Supporting resource notebooks:

_resources/00-setup.py: Configuration setup
_resources/01-data-generator.py: Generate synthetic supply chain data
_resources/02-generate-supply.py: Generate supply data

These resource notebooks will create the following tables that we will leverage to build our solution:

product_demand_historical: Historical product demand by wholesaler
distribution_center_to_wholesaler_mapping: Mapping between distribution centers and wholesalers
bom: Bill of materials with material relationships
plant_supply: Maximum supply capacity by plant and product
transport_cost: Transportation costs between plants and distribution centers
list_prices: Price for each product

Enhanced tables created by the advanced solution:

demand_features_enhanced: Advanced feature engineered dataset with time-based, lag, and market features
ml_demand_predictions: Predictions from Random Forest, XGBoost, and Prophet models
intelligent_stock_allocation: Optimized stock allocation with multi-constraint optimization
model_performance_metrics: Performance tracking and comparison across models

Notebooks

The solution consists of multiple Databricks notebooks:

Core Solution

01_Introduction_And_Setup.py:

Project overview and data setup.
Make sure to run it first and to then run each notebook sequentially.

02_Fine_Grained_Demand_Forecasting.py:

Time series forecasting generating one-week-ahead SKU demand for every wholesaler and distribution center with a Holt-Winters seasonal model.
The output is a table product_demand_forecasted with aggregate forecasts at the distribution center level

03_Derive_Raw_Material_Demand.py:

In this notebook, we process product demand forecasts to determine raw material requirements using a graph-based approach (by transforming the BOM data into graph edges).
The outputs are two tables: raw_material_demand and raw_material_supply

04_Optimize_Transportation.py:

Linear programming to minimize transportation costs
The output is the table shipment_recommendations

05_Data_Analysis_&_Functions.py:

Additional analysis and utility functions: the notebook identifies critical materials with supply shortages by comparing demand vs. supply data & analyzes hierarchical relationships between materials and products
As output, you get the following custom SQL Functions:
- product_from_raw: Maps a raw material to all downstream products
- raw_from_product: Maps a product to all upstream raw materials
- revenue_risk: Calculates potential revenue impact from raw material shortages

06_Vector_Search.py:

Generates supply-chain manager e-mails (unstructured data) and store the data in a vector index, enabling semantic queries that surface delay and risk signals

07_More_Functions.py:

Extended functionality and utilities
As output, you get the following custom SQL Functions:
- lookup_product_demand: Retrieves historical demand data for specific products and wholesalers
- query_unstructured_emails: Searches emails using vector search for relevant supply chain information
- execute_code_sandbox: Enables dynamic code execution for custom analysis
- _genie_query: Core function that interfaces with Databricks Genie API
- ask_genie_pharma_gsc: Natural language interface to query the supply chain dataset
Make sure to customize the Genie integration:
- Update the Databricks host URL and token
- Specify your Genie Space ID

Advanced Analytics

08_Enhanced_Feature_Engineering.py:

Advanced feature engineering pipeline with 20+ sophisticated features
Time-based features (seasonality, trends, holidays), lag features, rolling statistics
Product lifecycle and market dynamics features
The output is an enhanced table demand_features_enhanced ready for ML models

09_Advanced_ML_Models.py:

Modern machine learning models for demand prediction including Random Forest, XGBoost, and Facebook Prophet
Ensemble methods combining multiple models for improved accuracy
Model performance comparison and validation with comprehensive metrics
The output is a table ml_demand_predictions with model predictions and confidence intervals

10_Intelligent_Stock_Allocation.py:

Multi-constraint optimization for intelligent stock allocation
Priority-based allocation logic with risk-adjusted demand planning
Performance analytics and KPIs for allocation effectiveness
The output is a table intelligent_stock_allocation with optimized allocation recommendations

11_Complete_Solution_Guide.py:

End-to-end implementation guide integrating all enhanced components
Solution summary with performance comparisons between traditional and enhanced approaches
Deployment instructions and monitoring setup guidance

Interactive Web Dashboard

The enhanced solution includes a comprehensive Streamlit web dashboard that provides real-time visualization and monitoring capabilities:

Dashboard Features

Executive Overview: Key performance metrics, demand trends, and regional performance heatmaps
Demand Forecasting: Model comparison, error analysis, feature importance, and future predictions
Stock Allocation: Optimization results, plant utilization, cost analysis, and detailed allocation plans
Performance Analytics: KPI dashboard with MAE, inventory turnover, service levels, and trend analysis
Risk Management: Risk alerts, supply chain resilience scoring, and volatility analysis

Accessing the Dashboard

Quick Start: Run ./deploy.sh for automated deployment

Manual Setup:

cd dashboard
pip install -r requirements.txt
streamlit run streamlit_app.py

Access: Open browser to http://localhost:8501

Dashboard Configuration

Update .env file with your Databricks credentials
Configure dashboard settings in dashboard/config.py
Customize data connections in dashboard/data_utils.py

Automation & Monitoring

The enhanced solution includes automated monitoring and maintenance capabilities:

Available Scripts

run_analysis.py: Complete pipeline orchestration with health checks and validation
monitor_system.py: System health monitoring with data freshness and performance validation
retrain_models.py: Automated model retraining based on performance degradation thresholds
setup_environment.sh: Environment setup and dependency installation
deploy.sh: Complete deployment script for the enhanced solution

Monitoring Features

Data Quality Checks: Freshness, completeness, and consistency validation
Model Performance Tracking: Accuracy monitoring and drift detection
Allocation Performance: Demand fulfillment and cost efficiency metrics
Automated Alerts: Email notifications for system issues and performance degradation

Creating Your Own Agent

To create a supply chain agent, you can leverage these functions in any AI agent orchestrator as shown in this demo video: https://www.youtube.com/watch?v=cz-x2B31Ga8
Test your Agent with the following questions:
- What products are dependent on L6HUK material?
- How much revenue is at risk if we can’t produce the forecasted amount of product autoclave_1?
- Which products have delays right now?
- Are there any delays with syringe_1?
- What raw materials are required for syringe_1?
- Are there any shortages with one of the following raw materials: O4GRQ, Q5U3A, OAIFB or 58RJD?
- What are the delays associated with wholesaler 9?

You can also use SQL functions to query your supply chain:

SELECT query_unstructured_emails(
  'What delivery delays are affecting Distribution Center 3?'
);

🆕 Enhanced Query Capabilities: Access ML predictions and intelligent allocation data:

-- Query ML model predictions
SELECT * FROM ml_demand_predictions 
WHERE product_id = 'autoclave_1' AND prediction_date >= current_date();

-- Check intelligent allocation results
SELECT * FROM intelligent_stock_allocation 
WHERE risk_level = 'High' AND allocation_priority >= 0.8;

Acknowledgments

This project is based on Databricks' supply chain optimization solution accelerator available at: https://github.com/databricks-industry-solutions/supply-chain-optimization.
Supply Chain Optimization with Databricks - https://github.com/lararachidi/agent-supply-chain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Supply Chain Optimization with Databricks

Architecture

Getting Started

Supply chain data

Notebooks

Core Solution

Advanced Analytics

Interactive Web Dashboard

Dashboard Features

Accessing the Dashboard

Dashboard Configuration

Automation & Monitoring

Available Scripts

Monitoring Features

Creating Your Own Agent

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
_resources		_resources
dashboard		dashboard
01_Introduction_And_Setup.py		01_Introduction_And_Setup.py
02_Fine_Grained_Demand_Forecasting.py		02_Fine_Grained_Demand_Forecasting.py
03_Derive_Raw_Material_Demand.py		03_Derive_Raw_Material_Demand.py
04_Optimize_Transportation.py		04_Optimize_Transportation.py
05_Data_Analysis_&_Functions.py		05_Data_Analysis_&_Functions.py
06_Vector_Search.py		06_Vector_Search.py
07_More_Functions.py		07_More_Functions.py
08_Enhanced_Feature_Engineering.py		08_Enhanced_Feature_Engineering.py
09_Advanced_ML_Models.py		09_Advanced_ML_Models.py
10_Intelligent_Stock_Allocation.py		10_Intelligent_Stock_Allocation.py
11_Complete_Solution_Guide.py		11_Complete_Solution_Guide.py
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh

License

pratstick/supply-chain-predictive-analytics

Folders and files

Latest commit

History

Repository files navigation

Supply Chain Optimization with Databricks

Architecture

Getting Started

Supply chain data

Notebooks

Core Solution

Advanced Analytics

Interactive Web Dashboard

Dashboard Features

Accessing the Dashboard

Dashboard Configuration

Automation & Monitoring

Available Scripts

Monitoring Features

Creating Your Own Agent

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages