Bank Customer Churn Prediction

Despite the steady transformation over the decades, many banks today with a sizeable customer base hoping to gain a competitive edge.
While retaining existing customers and thereby increasing their lifetime value is something everyone acknowledges as being important, there is little the banks can do about customer churn when they don’t see it coming in the first place.

This is where predicting churn at the right time becomes important, especially when clear customer feedback is absent.Early and accurate churn prediction empowers CRM and customer experience teams to be creative and proactive in their engagement with the customer.

Objectives:

In this project our goal is to predict the probability of a customer is likely to churn using machine learning techniques.

Dataset:

Predicting Churn for Bank Customers

Implementation:

Libraries: sklearn Matplotlib pandas seaborn NumPy Scipy

Few glimpses of EDA:

1. Churn Distribution

From the above chart, we can say that our traget variable is imbalanced.

2. Distribution of the Categorical Variables:

a. Grography distribution in customer attrition

b. Gender distribution in customer attrition

c. Customer attrition w.r.t. products

d. Customer attrition w.r.t. credit card

e. Customer attrition w.r.t. active status of a member

3. Distribution of the continuous Variables:

a. Credit Score

b. Age distribution

c. Tenure distribution

d. Balance distribution

Model Training and Evaluation:

Feature Importances

We need to know which the important features are. In order to find that out, we trained the model using the Random Forest classifier.
The graph above shows the features with the highest importance value to the lowest importance value.

Model Selection

Since we are modeling a critic problem for that we need model with high performance possible. Here, we will try a couple of different machine learning algorithms in order to get an idea about which machine learning algorithm performs better. Also, we will perform a accuracy comparison amoung them. As our problem is a classification problem, the algorithms that we are going to choose are as follows:

K-Nearest Neighbor (KNN)
Logistic Regression (LR)
AdaBoost
Gradient Boosting (GB)
RandomForest (RF)

Base Model results:

Optimizations

1.Results after Hyper Parameter Tuning:

Adaboost:

parameters_list = {"algorithm" : ["SAMME","SAMME.R"],
                  "n_estimators" :[10,50,100,200,400]}
GSA = RandomizedSearchCV(AdaBoostClassifier(), param_distributions=parameters_list, n_iter=10, scoring = "roc_auc")
GSA.fit(X_train, y_train)

GSA.best_params_, GSA.best_score_
({'n_estimators': 200, 'algorithm': 'SAMME'}, 0.8432902741161931)

Gradientboost:

gb_parameters_list = {'loss' : ['deviance', 'exponential'],
                 'n_estimators': randint(10, 500),
                 'max_depth': randint(1,10)}

GBM = RandomizedSearchCV(GradientBoostingClassifier(), param_distributions=gb_parameters_list, n_iter=10, scoring="roc_auc")
GBM.fit(X_train, y_train)

GBM.best_params_, GBM.best_score_
({'loss': 'exponential', 'max_depth': 3, 'n_estimators': 241},
 0.8576619853133595)

2. Results after Feature Transformation:

'AdaBoostClassifier': 0.8442783055508478
'GradientBoostingClassifier': 0.873749653401012

3. Voting Classifier:

voting_model = VotingClassifier(estimators=[("gb", GBM_fit_transformed), 
                                            ("ADA", GSA_fit_transformed)],
                                voting = 'soft', weights = [2,1])

                                    
votingModel = voting_model.fit(X_train_transform, y_train)
test_labels_voting = votingModel.predict_proba(np.array(X_test_transform))[:,1]

votingModel.score(X_test_transform, y_test)
0.8732

roc_auc_score(y_test,test_labels_voting, average = 'macro', sample_weight = None)
0.8744660402064695

Lessons Learned

Data Imputation Handling Outliers Feature Engineering Classification Models Voting

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
output		output
Bank Customer churn prediction.ipynb		Bank Customer churn prediction.ipynb
README.md		README.md
data.csv		data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bank Customer Churn Prediction

Objectives:

Dataset:

Implementation:

Few glimpses of EDA:

1. Churn Distribution

2. Distribution of the Categorical Variables:

a. Grography distribution in customer attrition

b. Gender distribution in customer attrition

c. Customer attrition w.r.t. products

d. Customer attrition w.r.t. credit card

e. Customer attrition w.r.t. active status of a member

3. Distribution of the continuous Variables:

a. Credit Score

b. Age distribution

c. Tenure distribution

d. Balance distribution

Model Training and Evaluation:

Feature Importances

Model Selection

Base Model results:

Optimizations

1.Results after Hyper Parameter Tuning:

Adaboost:

Gradientboost:

2. Results after Feature Transformation:

3. Voting Classifier:

Lessons Learned

Feedback

🚀 About Me

Hi, I'm Pradnya! 👋

About

Languages

Pradnya1208/Bank-customers-churn-prediction

Folders and files

Latest commit

History

Repository files navigation

Bank Customer Churn Prediction

Objectives:

Dataset:

Implementation:

Few glimpses of EDA:

1. Churn Distribution

2. Distribution of the Categorical Variables:

a. Grography distribution in customer attrition

b. Gender distribution in customer attrition

c. Customer attrition w.r.t. products

d. Customer attrition w.r.t. credit card

e. Customer attrition w.r.t. active status of a member

3. Distribution of the continuous Variables:

a. Credit Score

b. Age distribution

c. Tenure distribution

d. Balance distribution

Model Training and Evaluation:

Feature Importances

Model Selection

Base Model results:

Optimizations

1.Results after Hyper Parameter Tuning:

Adaboost:

Gradientboost:

2. Results after Feature Transformation:

3. Voting Classifier:

Lessons Learned

Feedback

🚀 About Me

Hi, I'm Pradnya! 👋

About

Topics

Resources

Stars

Watchers

Forks

Languages