A company can experience rapid growth by gaining insights into the behavioral patterns of its customers, which in turn enables it to offer enhanced services and benefits to potential loyal customers. By utilizing historical marketing campaign data to enhance performance and effectively target customers for transactions on the company's platform, my primary objective is to create a predictive clustering model, streamlining decision-making for the company.
- The company does marketing campaigns for all of their customers. Doing this it has an average conversion rate per customer of ~ 0%.
- The company spends a nonoptimal amount of marketing resources by campaigning to every single customer. It's marketing ROI is -73.08%.
- Identify the factors that most influence customers' spending, purchases and conversion rate.
- Create an optimal K-Means Clustering model that can decisively segment customers for marketing retargeting purposes.
- Provide recommendations for potential strategies regarding targeted marketing based on findings from analyzes and modeling.
- Calculate the potential impact of model implementation on marketing ROI and conversion rate.
The dataset was obtained from Rakamin Academy.
Description:
AcceptedCmp1
- 1 if customer accepted the offer in the 1st campaign, 0 otherwiseAcceptedCmp2
- 1 if customer accepted the offer in the 2nd campaign, 0 otherwiseAcceptedCmp3
- 1 if customer accepted the offer in the 3rd campaign, 0 otherwiseAcceptedCmp4
- 1 if customer accepted the offer in the 4th campaign, 0 otherwiseAcceptedCmp5
- 1 if customer accepted the offer in the 5th campaign, 0 otherwiseResponse
- 1 if customer accepted the offer in the last campaign, 0 otherwiseComplain
- 1 if customer complained in the last 2 yearsDt_Customer
- date of customer’s enrolment with the companyEducation
- customer’s level of educationMarital
- customer’s marital statusKidhome
- number of small children in customer’s householdTeenhome
- number of teenagers in customer’s householdIncome
- customer’s yearly household incomeMntFishProducts
- amount spent on fish products in the last 2 yearsMntMeatProducts
- amount spent on meat products in the last 2 yearsMntFruits
- amount spent on fruits products in the last 2 yearsMntSweetProducts
- amount spent on sweet products in the last 2 yearsMntWines
- amount spent on wine products in the last 2 yearsMntGoldProds
- amount spent on gold products in the last 2 yearsNumDealsPurchases
- number of purchases made with discountNumCatalogPurchases
- number of purchases made using catalogueNumStorePurchases
- number of purchases made directly in storesNumWebPurchases
- number of purchases made through the company’s websiteNumWebVisitsMonth
- number of visits to the company’s website in the last monthRecency
- number of days since the last purchaseZ_CostContact
- cost to contact customerZ_Revenue
- revenue after client accepts campaign
Overview:
- Dataset contains 2240 rows, 28 features, 1
ID
column and 1 redundantUnnamed: 0
index column which is removed. - Dataset consists of 3 data types; float64, int64 and object
Dt_Customer
column could be changed into datetime data type- Dataset contains 24 Null Values from the
Income
feature
The following features are extracted from existing default features in order to aid in analysis and modeling.
Total_Spending
:
The total of each customer’s spending: sum of MntCoke, MntFruits, MntMeatProducts, MntFishProducts, MntSweetProducts and MntGoldProducts.Total_Acc
:
The total number of accepted campaigns by each customer, including the response for the last campaign.Total_Purchases
:
The total number of purchases made by each customer.Total_Children
:
The total number of children each customer has.Conversion_Rate
:
Total_Acc divided by number of web visits (NumWebVisitsMonth).Age
:
Age of each customer: 2014-Year_Birth.Age_Group
:
Segmentation of customer ages into 6 groups.Has_Partner
:
Segmentation of Marital_Status: “Yes” if married or engaged, “No” otherwise.
Correlation Heatmap of Numerical Features
Correlation Analysis:
- Strong Positive Correlations: Features like
Total_Spending
andTotal_Purchases
have strong positive correlations with various spending categories (MntCoke
,MntMeatProducts
, etc.), indicating that customers who spend more on these categories tend to spend more overall. - Negative Correlations:
NumWebVisitsMonth
has negative correlations with several features, suggesting that customers who visit the website more frequently tend to spend less on certain categories. - Income Correlations:
Income
is positively correlated with most spending categories andTotal_Spending
, indicating that customers with higher incomes tend to spend more. - Recency Correlations:
Recency
has weak correlations with most features, suggesting that it doesn't strongly correlate with other features in the dataset.
Conversion Rate Correlations:
- Income (0.33): There's a moderate positive correlation between
Income
and Conversion Rate, suggesting that customers with higher incomes tend to have a higher conversion rate. - Total_Spending (0.47): Conversion Rate is positively correlated with
Total_Spending
, indicating that customers who spend more tend to have a higher conversion rate. - NumCatalogPurchases (0.36): Conversion Rate has a moderate positive correlation with
NumCatalogPurchases
, implying that customers who make catalog purchases are more likely to have a higher conversion rate. - Age (-0.02): Interestingly,
Age
has little to no correlation with Conversion Rate, suggesting that even if there is a relationship it is either non-existent or non-linear in nature. - Recency (-0.05): Like
Age
,Recency
also has a very weak correlation with Conversion Rate. - Total_Purchases (0.21): Conversion Rate has a moderate positive correlation with
Total_Purchases
, indicating that customers with a higher total number of purchases tend to have a higher conversion rate.
Purchases, Spending and Income Correlations:
Observation:
The analysis reveals several noteworthy relationships among the variables. Firstly, Total Purchases exhibit a positive correlation with Income. Moreover, Total Spending is also positively correlated with both Total Purchases and Income, suggesting a connection between these economic factors in the dataset.
Conversion Rate Correlations:
Observation:
The data analysis reveals interesting insights into the correlations between different variables. Total Purchases and Total Spending both display a positive correlation with Conversion Rate, indicating a potential relationship between customer spending and the likelihood of conversion. Additionally, Income exhibits a decent positive correlation with Conversion Rate, while Age shows little to no correlation with Conversion Rate, as evident from the scatterplot.
- Total Spending is decently positively correlated with Conversion Rate. This indicates that customers who spend more in total are more likely to convert.
- Income is also positively correlated with Conversion Rate. Higher-income customers may be more likely to convert.
- The Age & Recency of the customers have little to no relationship with the Conversion Rate.
- Web Purchases, Catalog Purchases, and Store Purchases show high positive correlations with the Conversion Rate. Customers who make purchases through these channels are some what more likely to convert than other channels.
- Various product categories, such as Coke, Meat Products, and Sweet Products, show decent positive correlations with the Conversion Rate. These products are more popular with the customers than others.
- The number of children is negatively correlated with the Conversion Rate. Customers with more children are less likely to convert.
- The number of children on the other hand, is decently positively correlated with Deals (discounts) Purchases. Customers with more children are more likely to purchase discounted products.
There were 24 null values. All of which were in the Income
feature. Therefore null values were imputed using the median of the feature.
The Age
feature contained values above 100. The jump from 74 years old to 114 years old does not make sense. These data points were therefore dropped.
Since K-Means Clustering will be used to model the clusters, outliers should be removed so that the algorithm won't be drowned out by the outliers. Outliers from the following features were manually trimmed by looking at the boxplot of each feature: MntMeatProducts
, Income
, NumWebPurchases
, MntSweetProducts
, NumCatalogPurchases
, Total_Spending
.
The Unnamed: 0
feature is redundant as there was already an ID
feature and thus it was dropped. The following features were also dropped as they were unnecessary for the modelling and analysis: Marital_Status
, Dt_Customer
, Year_Birth
, Kidhome
, Teenhome
, AcceptedCmp1
, AcceptedCmp2
, AccptedCmp3
, AcceptedCmp4
, AcceptedCmp5
, Response
.
The categorical features remaining was encoded using label encoding. Those features are: Education
, Has_Partner
, Complain
, Age_Group
.
The numerical features were scaled using Scikit-Learn’s StandardScaler() for the purposes of the clustering.
Customers will be segmentized by:
- Purchasing Power:
Income
- Monetary Value:
Tptal_Spending
- Frequency:
Total_Purchases
- Activity:
NumWebVisitsMonth
- Loyalty:
Total_Acc
Silhouette Score by Number of Clusters
From the analysis of both the elbow method and the silhouette score, it was decided to divide the customers into 3 clusters.
PCA is used to reduce the dimensionality of the data, so that the clusters can be visualized on a lower dimension (2-D). The 3 clusters can be clearly seen below.
The quantity of customers is not evenly distributed among the clusters.
- Cluster 0: High value customers, relatively high income, frequent purchases, but less active in terms of visits.
- Cluster 1: High potential customers, high income, high spending, loyal in terms of campaign acceptance, but less frequent purchases and less active as well.
- Cluster 2: Low value customers, lower income, infrequent purchases, but active in terms of visits.
- High value cluster consists of mostly older customers with adult and middle aged customers dominating.
- The number of adults and young adults are almost equal in the High potential cluster with both of them dominating.
- Low value cluster consists of mostly adults with young adults in 2nd place.
As can be seen above low value customers make less and buy less, while high value and high potential customers make more and buy more, with high potential customers making slightly more money on average.
High potential customers purchase about the same number of times as high value customers, but the former spend considerably more on their purchases on average.
High Value Cluster:
- Has 851 customers and consists of 38% of the entire customer base.
- Oldest cluster by age, dominated by middle aged and adult customers.
- Highest spender on gold products on average, beating out even the highest spending cluster. A close 2nd on fish products.
- Most purchases overall on average.
- Least recent cluster (50.3 days)
- "Meat of the sandwich" cluster
High Potential Cluster:
- Has 212 customers and consists of 10% of the entire customer base.
- Dominated by young adult and middle aged customers.
- Highest spending cluster overall on average.
- Highest earning cluster overall (Rp. 79,7 million) on average.
- Most web purchases of all the clusters on average.
- Most loyal cluster, accepted more campaigns on average.
- Most recent cluster with an average of 46 days.
- Least active cluster with an average of 3 web visits in the last month.
- Highest conversion rate of all clusters (1 on average).
- Gold mine cluster
Low Value Cluster:
- Majority cluster, has 1153 customers and consists of 52% of the entire customer base.
- Youngest cluster by age, dominated by young adults and adults.
- Most children of all the clusters (1.2 children on average).
- Most active cluster with an average of 7 web visits last month.
- Close 2nd for most deals purchases with 2.25 on average.
- Lowest earning cluster overall (Rp. 35,5 million) on average.
- Lowest spending cluster overall on average.
- Least purchases overall on average.
- Least loyal cluster, accepted least number of campaigns on average.
- Lowest conversion rate of all clusters (0.03 on average).
- "Low value high quantity" cluster.
High Value Customers:
- Personalized Marketing: Leverage customer data to create personalized marketing campaigns and product recommendations, especially focusing on products that are favourites (e.g., Gold products and Fish Products).
- Upselling: Identify high-margin products and promote them to this cluster. Upsell premium and gold products to take advantage of their higher spending tendencies.
- Product Bundles: Encourage the purchase of complementary products by offering bundled deals. For instance, if a customer buys meat products, suggest adding fish or sweet products to their cart with a discount.
- Product Expansion: Explore expanding product lines to cater to older demographics and their preferences, as they have a higher mean age.
High Potential Customers:
- Loyalty Programs: Since this cluster shows high spending and conversion rates, consider implementing loyalty programs to reward and retain these valuable customers (e.g., exclusive memberships and VIP programs).
- Market Diversification: Explore opportunities to expand into related markets, as these customers have high spending capacities and show a willingness to spend on various categories.
- Exclusive Offers: Offer exclusive, high-end, and limited-edition products to tap into their spending capacity and increase their number of purchases.
- Customer Engagement: Engage with these customers through various channels and maintain a strong online presence, as they tend to make web purchases.
Low Value Customers:
- Youth-Centric Products: Given the relatively young age of customers in this cluster, develop products and services that resonate with younger demographics.
- Personalized Web Experience: Utilize data on their frequent web visits to personalize their online shopping experience. Recommend products based on their browsing history and past purchases to increase conversion rates.
- Family-Oriented Marketing: Given the relatively high number of children, Consider bundling products or offering family-oriented deals, as they might be family-oriented shoppers.
- Price-Sensitive Offers: Focus on offering value-for-money deals and discounts, as these customers have lower incomes and tend to buy deals.
- Educational Campaigns: To increase engagement and conversion rates, provide educational content about the benefits of different products. Highlight the nutritional value and diverse uses of sweet products in their daily life.
- Customer Retention: Focus on retaining this customer base by providing excellent customer service and building long-term loyalty.
If we focus on the high potential customers and target the campaigns to them exclusively we will see a massive improvement on Conversion Rate (~ 0.75 on average) and by extension marketing Return of Investment (ROI). Where:
ROI before retargeting ----------> -73.08%
ROI after retargeting -----------> 57.1%
By retargeting the campaigns to "High Potential Customers" we have improved the marketing ROI massively.