Predicting whether an auto insurance customer will renew their contract using supervised machine learning models and targeted feature engineering.
Data Set: Lledó, Josep (2023), “Dataset of an actual motor vehicle insurance portfolio”, Mendeley Data, V1
- Removed outliers (e.g., unrealistic ages or driving experience)
- Engineered new features:
Customer_age,Driving_experience,Contract_duration,Experience_ratio
- Created target variable
Renewal(1 = renewed, 0 = not renewed) - Normalized continuous features and dropped redundant columns
Correlation patterns and model-derived feature importance were examined to understand relationships between variables and their impact on renewal predictions.
|
|
Gradient Boosting achieved the strongest overall performance across all evaluation criteria.
| Model | Accuracy | F1-Score | ROC-AUC | Summary |
|---|---|---|---|---|
| Linear Regression | 79.4% | 0.54 | 0.62 | Underfitted, poor for nonlinear data |
| Random Forest | 83.2% | 0.90 | 0.839 | Strong recall, interpretable results |
| Gradient Boosting | 90.3% | 0.94 | 0.893 | Best balance of performance and generalization |

