Lead_Scoring_Case_Study

Topics Covered

Overview
Motivation
Setup
Technical Aspect
Demo

Overview

The objective of this project is to predict whether a customer/ lead will purchase the product or not. In this project the focus is mostly on the process of creating pipelines for data pre-processing, model training and inference using Airflow. The project also covers model experiment tracking using MLFlow. Pycaret module which is an opensource framework for auto ML is used to find the best fit model. During pycaret experimentation it is found that lightgbm, which is an xgboost classfication algorithm gives the best performance.The model was evaluated based on accuracy.The project also covers unit test cases using pytest.

Motivation

The project is a case study on an ed-tech startup which is finding ways to utilize it's marketing spends efficiently. The company has spent extensively on aquiring customers /leads and is now looking to reduce the CAC(customer acquisition cost). High CAC could be due to the following reasons:

Incorrect targeting.
High Competition.
Inefficient conversion

The business metric addressed here is Leads to Application Completion which resolves the third issue. A lead is generated when any person visits the website and enters their contact details on the platform. A junk lead is generated when a person who shares their contact details has no interest in the product/service. Having junk leads in the pipeline creates significant inefficiency in the sales process. Thus, the goal of the project is to build a system that categorises leads based on the likelihood of their purchasing the course. This system will help remove the inefficiency caused by junk leads in the sales process.

Techincal Aspect

Pipelines

Data Pipeline: processes the raw data
Training pipeline : preprocessing & model training
Inference pipeline: pre-processing , model prediction

Pre-processing & EDA

The dataset primarity focuses on the variables/features describing the origin of the lead(e.g: referred_leads,city_mapped) and the interaction of the lead with the website(1_on_1_mentorship, whatsapp_chat_click). The Exploratory Data Analysis was done using pandas profiling.

EDA Observations

The dataset consists a lot of missing values
There are only a few categories which are significant in first_platform_c, first_utm_source_c,first_utm_medium_c.
There are few interaction columns which have 99% missing values

Data Pre-processing

Reducing the high cardinality in city_mapping column: the city is mapped to tier1 , tier2 and tier 3.
The first_platform_c, first_utm_medium_c and first_utm_source_c columns contained only a few significant categories. To sort this problem, we can pick up the categories that covered 90% of the data. The smaller contributing categories can be classified as ‘others’. To do this, we must calculate the cumulative frequency of each category to filter them out on 90% criteria.
Replaced the null values with 0 for total_leads_dropped and referred_leads column.
There are 37 interaction columns which have to be classified into four categories namely assistance interaction, career interaction, payment interaction and syllabus interaction.

The notebook for data preprocessing and EDA can be found here Data Preprocessing&EDA

Model Experimentation

Pycaret is an open-source, low-code machine learning library in Python that is designed to simplify the machine learning process. It allows users to perform several common machine learning tasks, such as data pre-processing, feature engineering, model selection, hyperparameter tuning, and model deployment, with minimal coding.. Based on the initial experiment results, it was found that there were a few irrelevant features. The second run of the experiment was done after removing the irrelevant features. Pycaret internally logs the model too mllflow registry based on the parameters passed in setup function. The model experimentation notebook can be found here Model experimentation.

Test Cases

The project covers basic unit test cases to check the pre-processing functionalities.

Check load_data_to_db function.
Check mapping city to tiers functionality
Test case to check the correct mapping of categorical variables.
Test case to check interaction mapping schema

Test cases can be found here Test cases

Setup

Install necessary dependencies using the below command.

pip install -r requirements.txt

Install airflow locally
Airflow Setup

Create airflow user for the UI login

airflow users create \
    --username rakshitha\
    --firstname rakshitha\
    --lastname bs\
    --role Admin \
    --email [email protected]\
    --password 123

Run airflow webserver

airflow webserver -p 8080

Start airflow scheduler

airflow scheduler

MLflow setup

Starting mlflow tracking server

mlflow serve --model-uri <path_to_sqilte_db> --port <port_number> --host <host_address>

Demo

The screen shots for the pipelines can be found here https://github.com/RakshithaBS/Lead_Scoring_Case_Study/blob/master/MLOPS.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Lead_scoring_data_pipeline		Lead_scoring_data_pipeline
Lead_scoring_inference_pipeline		Lead_scoring_inference_pipeline
Lead_scoring_training_pipeline		Lead_scoring_training_pipeline
notebooks		notebooks
unit_test		unit_test
MLOPS.pdf		MLOPS.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lead_Scoring_Case_Study

Topics Covered

Overview

Motivation

Techincal Aspect

Pipelines

Pre-processing & EDA

EDA Observations

Data Pre-processing

Model Experimentation

Test Cases

Setup

Demo

About

Releases

Packages

Languages

RakshithaBS/Lead_Scoring_Case_Study

Folders and files

Latest commit

History

Repository files navigation

Lead_Scoring_Case_Study

Topics Covered

Overview

Motivation

Techincal Aspect

Pipelines

Pre-processing & EDA

EDA Observations

Data Pre-processing

Model Experimentation

Test Cases

Setup

Demo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages