A machine learning model to predict whether the customer will churn or not in the next six months. Reducing Customer Churn is a major objective for any firm. An additional possible source of income for every firm is the ability to predict customer churn, commonly referred to as customer attrition. The cost to the firm is impacted by customer churn. Increased customer churn results in revenue loss and higher marketing expenses because it takes longer to acquire new consumers. As a data scientist of a bank, you are asked to analyze the past data and predict whether the customer will churn or not in the next 6 months. This would help the bank to have the right engagement with customers at the right time.
Customer churning, also known as customer attrition or customer defection, refers to the phenomenon where customers or clients discontinue their relationship with a company or business. In other words, it's the rate at which customers stop using a company's products or services and choose to go to a competitor or simply stop making purchases altogether. The business is a telecom company that provides internet and mobile services to its customers. The goal of The business is a telecommunications company that provides mobile and landline services to its customers.
Understanding the reasons for churn through data analysis and customer feedback is essential for implementing effective strategies to retain customers and build long-term relationships. In this challenge, as a data scientist of a bank, you are asked to analyze the past data and predict whether the customer will churn or not in the next 6 months. This would help the bank to have the right engagement with customers at the right time.
To predict if a customer will churn or not.
- Loading and Understanding the data
- Data Cleaning
- Exploratory Data Analysis
- Feature Engineering
- Model Building
- Model's Performance Comparison
- Feature Importance
- Conclusion
- Recommendations
- Pandas
- NumPY
- Matplotlib
- Seaborn
- Plotly
- Scikit-learn
From the above representation Age and Vintage feature are the most effective features. The bank should focus on these features to reduce the churn rate. Also, the number of product holdings and whether a customer has a credit card contributes the least in target prediction.
- ROC-AUC curve shows the model performance by plotting the false positive rate to true positive rate.
- More the skewness of the curve towards the upper left corner higher is the area under the roc curve and better is the model performance.
- The Random Forest showed a tremendous performance but it is likely to overfit.
- Most of the models have achived around 65% accuracy.
- From the ROC-AUC curve it is clear that Logistic Regression performed poorly.
- Trying other ensemble techniques like XGBoost and CatBoost might also help.
- Trying Deep Learning approach, training Artificial Neural Network.
- Experimenting with under-sampling to see whether there is any change in model performance.