Telecom_Churn_Problem

Using 21 predictor variables and applying Logistic Regression, predicting whether a particular customer will switch to another telecom provider or not. In telecom terminology, this is referred to as churning and not churning, respectively.

To summarise, the steps performed throughout model building and model evaluation processes are:

Data cleaning and preparation

Combining three dataframes
Handling categorical variables
- Mapping categorical variables to integers
- Dummy variable creation
Handling missing values

Test-train split and scaling
Model Building

Feature elimination based on correlations
Feature selection using RFE (Coarse Tuning)
Manual feature elimination (using p-values and VIFs)

Model Evaluation

Accuracy
Sensitivity and Specificity
Optimal cut-off using ROC curve
Precision and Recall

Predictions on the test set

Firstly, classes were assigned to all the customers in the test data set. For this, a probability cutoff of 0.5 was used. The model thus made, was very accurate (Accuracy = ~80%), but it had a very low sensitivity (~53%). Thus, a different cutoff was tried out, i.e. 0.3, which resulted in a model with slightly lower accuracy (~77%), but a much better sensitivity (~78%). Hence, it was learnt that one should not just blindly use 0.5 as the cutoff for probability every time to make a model. Business understanding 'must be' applied. Here, that means playing around with the cutoff, until one gets the most useful model.

Also, the sensitivity of a model is the proportion of yeses (or positives) correctly predicted by it as yeses (or positives). And, the specificity is equal to the proportion of nos (or negatives) correctly predicted by the model as nos (or negatives). For any given model, if the sensitivity increases by changing the cutoff, its specificity goes down.

High values of both (Sensitivity and Specificity) cannot be achieved in a single model. Hence, one has to choose which parameter would needs to be higher. The safest option, though, is the one in which it just takes the cutoff that equalises accuracy, sensitivity and specificity. But it totally depends on the business context. Sometimes one might want a higher sensitivity, sometimes one might want a higher specificity.

In the model building process also an another view of things which was the Precision and Recall view was seen. This was very much related to sensitivity and specificity view. Precision essentially means of the 'Yeses' predicted, how many were actually yeses. Recall on the other hand is that same as sensitivity, i.e. out of the total actual yeses, how many were correctly predict.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
Telecom Churn Data Dictionary.csv		Telecom Churn Data Dictionary.csv
Telecom_Churn_LogR_main.ipynb		Telecom_Churn_LogR_main.ipynb
churn_data.csv		churn_data.csv
customer_data.csv		customer_data.csv
internet_data.csv		internet_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telecom_Churn_Problem

About

Releases

Packages

Languages

lakshaykakkarr/telecom-churn

Folders and files

Latest commit

History

Repository files navigation

Telecom_Churn_Problem

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages