This project focuses on predicting sales for retail stores using machine learning models. It includes data preprocessing, exploratory data analysis (EDA), feature engineering, model training, and evaluation. The goal is to provide actionable insights for businesses to optimize inventory and promotional strategies.
- Predicts sales based on key factors such as promotions, customer count, and day of the week.
- Includes data preprocessing and visualization for exploratory analysis.
- Employs multiple machine learning models, including Linear Regression and XGBoost.
- Deployment-ready model for real-world usage with options for web or API-based interfaces.
- Additional analysis of weekday vs weekend sales comparison.
The dataset contains the following features:
- Date: The date of the sales.
- Sales: Total sales for the day.
- Customers: Number of customers visiting the store.
- Promo: Whether a promotion was active (1 = Yes, 0 = No).
- DayOfWeek: The day of the week.
-
Clone the repository:
git clone https://github.com/Coder-093/sales-prediction.git cd sales_prediction-project -
Install the required dependencies:
pip install -r requirements.txt
-
Run the Jupyter notebook for training and evaluation:
jupyter notebook notebooks/SalesPrediction.ipynb
-
Run the additional analysis notebook:
jupyter notebook notebooks/WeekdayVsWeekendSales.ipynb
-
Train the Model: Use the Jupyter notebook to preprocess the data and train the model.
-
Predict Sales: Load the saved models (linear-regression_model.pkl, xgbboost_model.pkl) and use it to predict sales for new data.
-
Analyze Sales Patterns: Use the WeekdayVsWeekendAnalysis notebook to understand sales trends.
-
Deploy the Model: Use app.py for deploying the model with Streamlit.
The model achieved the following results:
- Mean Absolute Error (MAE): 985.43 using Linear Regression Model and 673.87 using XGBoost Model
- Best Performing Model: XGBoost
The XGBoost model was selected as the best-performing model based on its superior performance compared to other models. The evaluation was conducted using the Mean Absolute Error (MAE) metric, where a lower value indicates better accuracy. Below are the results for comparison:
Linear Regression MAE: 985.43
XGBoost MAE: 673.87
XGBoost's ability to handle complex relationships and provide better accuracy on the test dataset made it the ideal choice for this project.
sales_prediction_project/
├── data/ # Folder for datasets
│ └── train.csv # Dataset
├── notebooks/ # Jupyter notebooks
│ ├── SalesPrediction.ipynb # Notebook for exploration and modeling
│ └── WeekdayVsWeekendSales.ipynb # Notebook for weekday vs weekend analysis
├── models/ # Folder for saved models
│ └── linear_regression_model.pkl # Saved ML model
│ └── xgbboost_model.pkl # Saved ML model
├── app.py # Deployment script
├── requirements.txt # List of dependencies
├── README.md # Project documentation- Incorporate external data such as weather and holiday schedules to improve predictions.
- Experiment with advanced models like ARIMA or LSTMs for time-series forecasting.
- Deploy the model as a cloud-based API or interactive web app.
This project is licensed under the MIT License. See the LICENSE file for details.
Dataset: [Source of the dataset -> Kaggle] Rossmann Sales Dataset
Libraries: Scikit-learn, XGBoost, Pandas, Seaborn, Matplotlib