This project demonstrates an enhanced supply chain optimization solution for distribution networks using Databricks, with 3 man- To create a supply chain agent, you can leverage these functions in any AI agent orchestrator
- Enhanced with ML Models: The enhanced solution provides additional agent capabilities through advanced ML predictions and intelligent allocation recommendations
- Test your Agent with the following questions:
- What products are dependent on L6HUK material?
- How much revenue is at risk if we can't produce the forecasted amount of product autoclave_1?
- Which products have delays right now?
- Are there any delays with syringe_1?
- What raw materials are required for syringe_1?
- Are there any shortages with one of the following raw materials: O4GRQ, Q5U3A, OAIFB or 58RJD?
- What are the delays associated with wholesaler 9?
- What's the ML model accuracy for product forecasting?
- Show me the risk-adjusted allocation for high-priority products
- Which products have the highest demand volatility?g plants delivering 30 product SKUs to 5 distribution centers, which in turn serve 30-60 wholesalers each. The solution leverages Databricks' distributed computing capabilities and modern ML techniques to answer questions such as: How much revenue is at risk if we can't produce the forecasted amount of product autoclave_1?
Enhanced with Advanced Analytics: This solution has been augmented with modern machine learning models, intelligent stock allocation, interactive web dashboards, and automated monitoring capabilities for enterprise-ready supply chain analytics.upply Chain Optimization with Databricks
This project demonstrates a supply chain optimization solution for distribution networks using Databricks, with 3 manufacturing plants delivering 30 product SKUs to 5 distribution centers, which in turn serve 30-60 wholesalers each. The solution leverages Databricks' distributed computing capabilities to answer questions such as: How much revenue is at risk if we can’t produce the forecasted amount of product autoclave_1?
The solution follows this workflow:
- Demand Forecasting: Generate one-week-ahead forecasts for each product/wholesaler combination using traditional Holt-Winters and advanced ML models (Random Forest, XGBoost, Prophet)
- Demand Aggregation: Aggregate forecasts at distribution center level with enhanced feature engineering
- Raw Material Planning: Convert finished product demand into raw material requirements using graph-based BOM analysis
- Transportation Optimization: Minimize shipping costs between plants and distribution centers using linear programming
- Intelligent Stock Allocation: Multi-constraint optimization with priority-based allocation logic and risk-adjusted demand planning
- Interactive Dashboard: Real-time visualization and monitoring with Streamlit web interface
- Automated Monitoring: Health checks, performance monitoring, and automated model retraining
- Set up a Databricks workspace
- Ensure that you have permissions to write to a catalog and create and/or use a cluster
- Create and/or start your cluster
- This solution has been tested on the following Databricks cluster configuration:
- Cluster Type: Personal Compute
- Access Mode: Dedicated (formerly: Single user)
- Databricks Runtime Version: 16.3 ML (includes Apache Spark 3.5.2, Scala 2.12)
- Node Type: i3.xlarge (30.5 GB Memory, 4 Cores)
- Import all notebooks into your workspace
- In the workspace tab, under your user name follow these steps (recommended):
- Create a Git folder
- Add the Git repository URL
- Alternatively, you can do right click and import the notebooks one by one, however it is recommended to add the Git repository URL instead
- Run
01_Introduction_And_Setup.py
to initialize the environment and generate sample data
- The notebooks use widgets for configuration:
catalog_name
: Databricks catalog name (default: "main")db_name
: Database name (default: "supply_chain_db")
- Set up the enhanced environment (optional)
- Copy
.env.example
to.env
and update with your credentials - Run
./setup_environment.sh
for automated setup - Or use
./deploy.sh
for complete deployment including dashboard
- You have multiple options to run the solution
- Option 1: Run all notebooks at once using magic commands
- Option 2: Run each notebook manually in numerical order (recommended for learning)
- Option 3: Run the complete enhanced pipeline using
python run_analysis.py
Supporting resource notebooks:
_resources/00-setup.py
: Configuration setup_resources/01-data-generator.py
: Generate synthetic supply chain data_resources/02-generate-supply.py
: Generate supply data
These resource notebooks will create the following tables that we will leverage to build our solution:
product_demand_historical
: Historical product demand by wholesalerdistribution_center_to_wholesaler_mapping
: Mapping between distribution centers and wholesalersbom
: Bill of materials with material relationshipsplant_supply
: Maximum supply capacity by plant and producttransport_cost
: Transportation costs between plants and distribution centerslist_prices
: Price for each product
Enhanced tables created by the advanced solution:
demand_features_enhanced
: Advanced feature engineered dataset with time-based, lag, and market featuresml_demand_predictions
: Predictions from Random Forest, XGBoost, and Prophet modelsintelligent_stock_allocation
: Optimized stock allocation with multi-constraint optimizationmodel_performance_metrics
: Performance tracking and comparison across models
The solution consists of multiple Databricks notebooks:
01_Introduction_And_Setup.py
:
- Project overview and data setup.
- Make sure to run it first and to then run each notebook sequentially.
02_Fine_Grained_Demand_Forecasting.py
:
- Time series forecasting generating one-week-ahead SKU demand for every wholesaler and distribution center with a Holt-Winters seasonal model.
- The output is a table
product_demand_forecasted
with aggregate forecasts at the distribution center level
03_Derive_Raw_Material_Demand.py
:
- In this notebook, we process product demand forecasts to determine raw material requirements using a graph-based approach (by transforming the BOM data into graph edges).
- The outputs are two tables:
raw_material_demand
andraw_material_supply
04_Optimize_Transportation.py
:
- Linear programming to minimize transportation costs
- The output is the table
shipment_recommendations
05_Data_Analysis_&_Functions.py
:
- Additional analysis and utility functions: the notebook identifies critical materials with supply shortages by comparing demand vs. supply data & analyzes hierarchical relationships between materials and products
- As output, you get the following custom SQL Functions:
product_from_raw
: Maps a raw material to all downstream productsraw_from_product
: Maps a product to all upstream raw materialsrevenue_risk
: Calculates potential revenue impact from raw material shortages
06_Vector_Search.py
:
- Generates supply-chain manager e-mails (unstructured data) and store the data in a vector index, enabling semantic queries that surface delay and risk signals
07_More_Functions.py
:
- Extended functionality and utilities
- As output, you get the following custom SQL Functions:
lookup_product_demand
: Retrieves historical demand data for specific products and wholesalersquery_unstructured_emails
: Searches emails using vector search for relevant supply chain informationexecute_code_sandbox
: Enables dynamic code execution for custom analysis_genie_query
: Core function that interfaces with Databricks Genie APIask_genie_pharma_gsc
: Natural language interface to query the supply chain dataset
- Make sure to customize the Genie integration:
- Update the Databricks host URL and token
- Specify your Genie Space ID
08_Enhanced_Feature_Engineering.py
:
- Advanced feature engineering pipeline with 20+ sophisticated features
- Time-based features (seasonality, trends, holidays), lag features, rolling statistics
- Product lifecycle and market dynamics features
- The output is an enhanced table
demand_features_enhanced
ready for ML models
09_Advanced_ML_Models.py
:
- Modern machine learning models for demand prediction including Random Forest, XGBoost, and Facebook Prophet
- Ensemble methods combining multiple models for improved accuracy
- Model performance comparison and validation with comprehensive metrics
- The output is a table
ml_demand_predictions
with model predictions and confidence intervals
10_Intelligent_Stock_Allocation.py
:
- Multi-constraint optimization for intelligent stock allocation
- Priority-based allocation logic with risk-adjusted demand planning
- Performance analytics and KPIs for allocation effectiveness
- The output is a table
intelligent_stock_allocation
with optimized allocation recommendations
11_Complete_Solution_Guide.py
:
- End-to-end implementation guide integrating all enhanced components
- Solution summary with performance comparisons between traditional and enhanced approaches
- Deployment instructions and monitoring setup guidance
The enhanced solution includes a comprehensive Streamlit web dashboard that provides real-time visualization and monitoring capabilities:
- Executive Overview: Key performance metrics, demand trends, and regional performance heatmaps
- Demand Forecasting: Model comparison, error analysis, feature importance, and future predictions
- Stock Allocation: Optimization results, plant utilization, cost analysis, and detailed allocation plans
- Performance Analytics: KPI dashboard with MAE, inventory turnover, service levels, and trend analysis
- Risk Management: Risk alerts, supply chain resilience scoring, and volatility analysis
- Quick Start: Run
./deploy.sh
for automated deployment - Manual Setup:
cd dashboard pip install -r requirements.txt streamlit run streamlit_app.py
- Access: Open browser to
http://localhost:8501
- Update
.env
file with your Databricks credentials - Configure dashboard settings in
dashboard/config.py
- Customize data connections in
dashboard/data_utils.py
The enhanced solution includes automated monitoring and maintenance capabilities:
run_analysis.py
: Complete pipeline orchestration with health checks and validationmonitor_system.py
: System health monitoring with data freshness and performance validationretrain_models.py
: Automated model retraining based on performance degradation thresholdssetup_environment.sh
: Environment setup and dependency installationdeploy.sh
: Complete deployment script for the enhanced solution
- Data Quality Checks: Freshness, completeness, and consistency validation
- Model Performance Tracking: Accuracy monitoring and drift detection
- Allocation Performance: Demand fulfillment and cost efficiency metrics
- Automated Alerts: Email notifications for system issues and performance degradation
-
To create a supply chain agent, you can leverage these functions in any AI agent orchestrator as shown in this demo video: https://www.youtube.com/watch?v=cz-x2B31Ga8
-
Test your Agent with the following questions:
- What products are dependent on L6HUK material?
- How much revenue is at risk if we can’t produce the forecasted amount of product autoclave_1?
- Which products have delays right now?
- Are there any delays with syringe_1?
- What raw materials are required for syringe_1?
- Are there any shortages with one of the following raw materials: O4GRQ, Q5U3A, OAIFB or 58RJD?
- What are the delays associated with wholesaler 9?
-
You can also use SQL functions to query your supply chain:
SELECT query_unstructured_emails( 'What delivery delays are affecting Distribution Center 3?' );
-
🆕 Enhanced Query Capabilities: Access ML predictions and intelligent allocation data:
-- Query ML model predictions SELECT * FROM ml_demand_predictions WHERE product_id = 'autoclave_1' AND prediction_date >= current_date(); -- Check intelligent allocation results SELECT * FROM intelligent_stock_allocation WHERE risk_level = 'High' AND allocation_priority >= 0.8;
- This project is based on Databricks' supply chain optimization solution accelerator available at: https://github.com/databricks-industry-solutions/supply-chain-optimization.
- Supply Chain Optimization with Databricks - https://github.com/lararachidi/agent-supply-chain