The Game Churn Prediction AI is a complete machine learning system built to predict online player churn. It evaluates gameplay engagement to determine the probability of a player abandoning the game and provides AI-powered engagement recommendations through an agentic assistant.
This project demonstrates an end-to-end ML pipeline — from data cleaning and model training, through evaluation and comparison, to a professional Streamlit dashboard with an integrated Agentic AI assistant.
The Dashboard tab shows real-time KPI cards, model evaluation metrics, cross-validation results, and a confusion matrix heatmap.
Upload a CSV and instantly see per-player churn probabilities with color-coded risk levels.
Visual breakdown of which gameplay factors most heavily influence churn.
Select any player to generate a structured retention report with actionable recommendations and a downloadable PDF.
| Model | Accuracy | Precision | Recall | F1 Score | CV Mean |
|---|---|---|---|---|---|
| Random Forest | 94.5% | 91.5% | 86.6% | 89.0% | 94.4% |
| Logistic Regression | 87.4% | 80.4% | 67.7% | 73.5% | 87.7% |
game-churn-prediction-ai/
├── agents/
│ └── engagement_agent.py # Agentic AI engagement optimization assistant
├── knowledge_base/
│ └── strategies.json # RAG-style retention strategy knowledge base
├── utils/
│ └── report_generator.py # Structured report & PDF export generator
├── data/
│ ├── raw_data.csv # Raw Kaggle gaming behavior dataset
│ └── clean_data.csv # Preprocessed data ready for ML training
├── notebooks/
│ ├── data_cleaning.ipynb # Data processing notebook
│ └── model_training.ipynb # Model evaluation notebook
├── models/
│ ├── random_forest_model.pkl # Trained Random Forest classifier
│ ├── logistic_regression_model.pkl # Trained Logistic Regression classifier
│ ├── churn_model.pkl # Backward-compatible model alias
│ └── model_features.pkl # Feature name list for inference alignment
├── app.py # Streamlit dashboard application
├── train.py # CLI training script
├── preprocess.py # Shared preprocessing module
├── metrics.json # Saved evaluation & cross-validation metrics
├── requirements.txt # Python dependencies
├── runtime.txt # Python runtime specification
├── architecture.md # System architecture diagram (Mermaid)
├── .gitignore # Git ignore rules
└── README.md # This file
git clone https://github.com/username/game-churn-prediction-ai.git
cd game-churn-prediction-aipython3 -m venv venv
source venv/bin/activate # Mac / Linux
# venv\Scripts\activate # Windowspip install -r requirements.txtTrain both models and generate metrics.json:
python train.pyOutput:
models/random_forest_model.pklmodels/logistic_regression_model.pklmodels/model_features.pklmetrics.json
The script logs accuracy, precision, recall, F1 score, and 5-fold cross-validation results for each model.
streamlit run app.py- Model Selection — Switch between Random Forest and Logistic Regression from the sidebar.
- KPI Dashboard — View total players, high-risk count, average churn probability, and model accuracy at a glance.
- Evaluation Metrics — Accuracy, Precision, Recall, F1 Score displayed dynamically from
metrics.json. - Confusion Matrix — Visual heatmap comparing predictions against ground truth (when Churn column is present).
- Cross-Validation — 5-fold CV scores and mean accuracy displayed per model.
- Prediction Table — Color-coded risk levels for up to 500 players.
- Feature Importance — Top 10 churn drivers visualized with bar charts (Random Forest) or coefficient magnitudes (Logistic Regression).
- Model Comparison Table — Side-by-side performance metrics for both models.
- AI Engagement Assistant — Agentic AI that analyzes player behavior and generates structured retention recommendations.
- PDF Export — Download engagement reports as professionally formatted PDFs.
See architecture.md for the full system architecture diagram including:
- User Upload flow
- Churn Prediction Model
- Agentic AI Assistant
- Knowledge Base (RAG)
- Report Generator
- Streamlit UI
Predict Online Gaming Behavior Dataset (Kaggle):
https://www.kaggle.com/datasets/rabieelkharoua/predict-online-gaming-behavior-dataset
| Layer | Technologies |
|---|---|
| Data Processing | pandas, numpy |
| Machine Learning | scikit-learn |
| Visualization | matplotlib, seaborn |
| Application Server | streamlit |
| Report Generation | reportlab |
| AI Agent | Rule-based reasoning + JSON knowledge base |
This project is compatible with Streamlit Community Cloud for free public hosting.
- Models are pre-trained and included in the
models/directory. You can retrain at any time withpython train.py. - The project uses only free-tier, open-source tools — no paid APIs required.
- All code includes comprehensive docstrings and inline comments for readability.