Customer Churn Analysis

Overview

The objective of this project is to identify potential telecom churners of TNB telco inc. (hypothetical compony), enabling the company to take proactive measures to retain these customers. Using the Telco Customer Churn dataset provided by TNB (dataset is available on kaggle), various machine learning models have been evaluated to achieve this goal. You can view the detailed report on

Objective

The primary goal is to identify possible telecom churners so that the company can implement strategies to retain these customers. While the implementation of retention strategies is outside the scope of this project, the insights provided can greatly inform decision-making.

Dataset

This project utilizes the Kaggle Telco Customer Churn dataset, which contains comprehensive information about telecom customers, including their usage patterns, payment methods, and service preferences.

Frameworks and Tools

This project leverages a range of powerful frameworks and tools to ensure cutting-edge performance and efficiency. Here are the key technologies used:

Core Technologies

Plotly : Interactive data visualization library that brings your data to life.
Featuretools : Automated feature engineering for creating meaningful features from raw data.
LightGBM : Gradient boosting framework that uses tree-based learning algorithms.
Optuna : Hyperparameter optimization framework to enhance model performance.
MLflow : Platform for managing the end-to-end machine learning lifecycle.
Dagshub : Collaborative data science platform for versioning and managing datasets and models.

Additional Tools

Sphinx : Documentation generator for creating beautiful project docs.
DVC : Data version control system for managing data and model versions.
Scikit-learn : Machine learning library for Python providing simple and efficient tools.
TensorFlow : Open-source platform for machine learning and artificial intelligence.
Pandas : Data analysis and manipulation library for Python.
XGBoost : Scalable and flexible gradient boosting library.
CatBoost : Gradient boosting library that handles categorical features efficiently.
Seaborn : Statistical data visualization library built on top of Matplotlib.
Keras : High-level neural networks API, written in Python and capable of running on top of TensorFlow.

Click here for more details on the Methodology

Methodology

To ensure a thorough analysis and implementation, I explored multiple models and techniques, which demonstrates my adaptability and desire to leave no stone unturned. Below is a brief look into the methodologies that helped drive the project’s success:

Models

Several machine learning models were tested to predict customer churn, including LightGBM (LGB), XGBoost (XGB), CatBoost (Cat), and Artificial Neural Networks (ANN). After thorough comparison, LightGBM and ANN outperformed the rest, offering the best balance of accuracy and interpretability.

Feature Engineering

Featuretools was used for automatic feature construction, which proved to be highly effective. The top 15 features were mostly generated by Featuretools, highlighting the benefits of automated feature engineering.

I believe that sophisticated feature engineering techniques, a key to improving model accuracy.

Data Imputation

Missing values were handled using median imputation for numerical data, while categorical features received a special "missing" category. This was achieved using the ColumnTransformer and Imputer classes.

Performance Metrics

To balance recall and precision, I used a custom weighted recall metric:

Weighted Recall = 0.65 * Recall + 0.35 * F1 Score.

The model achieved a recall of 0.80 and a precision of 0.54. Emphasizing the recall ensures that the model captures as many churners as possible, which is crucial for customer retention strategies.

I've optimized the metrics based on the project’s goals, ensuring I’m providing real-world, actionable insights.

Key Findings

Charges: Higher churn rates among monthly users are attributed to charges. Customers with higher costs are more likely to churn.
Senior Citizens: Senior citizens have a notably higher churn rate—approximately double that of younger customers. This is because they tend to be more cautious with their finances, leading them to reconsider non-essential services more frequently.
Automatic Payment Method: Customers using automatic payment methods have a lower churn rate. The convenience of automatic payments reduces the likelihood of reconsidering their commitment to the service.
Fiber Optic Service: Customers show a clear preference against fiber optic services, suggesting potential issues with reliability, speed, or customer support. Addressing these issues could help reduce churn and increase customer satisfaction.

Follow the steps below to get the project up and running and uncover the secrets behind its success.

1. Project Setup: Getting Started the Right Way

Before you dive into the data, let’s get your environment set up:

Clone the Project:

git clone https://github.com/d-sutariya/customer_churn_prediction.git

Create and Activate a Virtual Environment:

python -m venv env
# On Unix: source env/bin/activate
# On Windows: env\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```

Run the Setup Script:

cd customer_churn_prediction
python src/config/setup_project.py

Now you're ready to transform raw data into valuable insights that could change business operations.

2. Transforming Raw Data: See the Magic Happen

Transform raw customer data into a training-ready dataset with the following command:

Run the Transformation Script:

python src/data/make_dataset.py --input_file_path data/raw/WA_Fn-UseC_-Telco-Customer-Churn.csv

Curious to see the process in action? Explore my Jupyter notebooks for an in-depth look!

3. Jupyter Notebooks: Unlock the Insights

Here’s where the real magic happens. My Jupyter notebooks offer deep insights into customer churn predictions. Dive into them to see innovative approaches and results:

Explore the Notebook:

Customer Churn EDA Notebook

These notebooks are not just scripts—they are a window into the detailed thought process behind every step.

4. Production-Ready Scripts: Efficiency and Scalability

Head over to the src/ directory to find core production scripts designed for efficiency and scalability:

ETL Pipeline Script:
- src/data/make_dataset.py
Data Pipeline Configuration:
- src/pipeline/dvc.yaml
Hyperparameter Optimization:
- src/optimization/tuning_and_tracking.py

Imagine these scripts as part of your production pipeline. They are designed to be efficient and scalable.

5. Post-Deployment Magic: Ongoing Success

The journey doesn’t end with deployment. The post_deployment/ directory includes scripts for:

Transforming new data.
Periodically retraining the model.

Check the scripts here:

post_deployment

These scripts ensure your operations team stays ahead of potential issues and maintains model accuracy over time.

6. Final Thoughts: Dive Deep into What Sets This Apart

I encourage you to explore this project thoroughly. From cutting-edge data transformations to production-ready pipelines, every piece has been crafted to address real-world problems.

As you delve into the materials, I hope you see the value and potential of this project and how it could fit into your business.

Project Organization

├── README.md           <- The top-level README for developers using this project.
├── data
│   ├── external        <- Data from third-party sources.
│   ├── interim         <- Intermediate data that has been transformed.
│   ├── processed       <- Final, canonical datasets ready for modeling.
│   └── raw             <- The original, immutable data dump.
│
├── docs                <- Project documentation.
│
├── models              <- Trained and serialized models.
│
├── notebooks           <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── post_deployment     <- Scripts related to post-deployment activities.
│
├── reports             <- Feature transformation definitions, predictions, and mlflow runs.
│
├── requirements.txt    <- The requirements file for reproducing the analysis environment.
│
├── setup.py            <- Makes project pip installable (pip install -e .) so src can be imported.
│
├── src                 <- Source code for use in this project.
│   ├── config          <- Script for setting up the project locally.
│   ├── data            <- Scripts to download or generate data.
│   │   ├── make_dataset.py
│   │   └── data_utils.py <- Data processing utilities.
│   ├── features        <- Scripts to turn raw data into features for modeling.
│   │   └── generate_and_transform_features.py <- Generate and transform features using Featuretools.
│   ├── models          <- Scripts to train models and use them for predictions.
│   │   ├── predict_model.py
│   │   └── train_model.py
│   ├── optimization    <- Scripts related to model optimization.
│   │   ├── ensemble_utils.py <- Utilities for ensembling models.
│   │   ├── model_optimization.py <- Manual model optimization.
│   │   └── tuning_and_tracking.py <- Hyperparameter tuning and tracking using MLflow and DagsHub.
│   ├── pipeline        <- DVC pipeline for data cleaning to model predictions.
│   │   └── dvc.yaml    <- Full pipeline configuration.
│
└── tox.ini             <- Tox file with settings for running tests and managing environments.

Feel free to reach out with any questions or feedback. I look forward to your thoughts!

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
docs		docs
models		models
notebooks		notebooks
post_deployment		post_deployment
reports		reports
src		src
.dvcignore		.dvcignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Customer_Churn_Prediction_TNB_Project_Report.pdf		Customer_Churn_Prediction_TNB_Project_Report.pdf
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
make.bat		make.bat
readthedocs.yaml		readthedocs.yaml
requriememts.txt		requriememts.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Analysis

Overview

Objective

Dataset

Frameworks and Tools

Core Technologies

Additional Tools

Methodology

Models

Feature Engineering

Data Imputation

Performance Metrics

Key Findings

1. Project Setup: Getting Started the Right Way

2. Transforming Raw Data: See the Magic Happen

3. Jupyter Notebooks: Unlock the Insights

4. Production-Ready Scripts: Efficiency and Scalability

5. Post-Deployment Magic: Ongoing Success

6. Final Thoughts: Dive Deep into What Sets This Apart

Project Organization

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Analysis

Overview

Objective

Dataset

Frameworks and Tools

Core Technologies

Additional Tools

Methodology

Models

Feature Engineering

Data Imputation

Performance Metrics

Key Findings

1. Project Setup: Getting Started the Right Way

2. Transforming Raw Data: See the Magic Happen

3. Jupyter Notebooks: Unlock the Insights

4. Production-Ready Scripts: Efficiency and Scalability

5. Post-Deployment Magic: Ongoing Success

6. Final Thoughts: Dive Deep into What Sets This Apart

Project Organization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages