Predicting customer churn is critical for telecom companies aiming to retain valuable customers and reduce revenue loss. This project builds a machine learning pipeline to analyze telecom customer data and predict the likelihood of churn, using a Logistic Regression model and advanced preprocessing techniques.
- Goal: Predict which customers are likely to churn based on their usage and account features.
- Techniques:
- Data preprocessing and feature selection
- Handling class imbalance with SMOTE
- Logistic Regression modeling
- Feature importance analysis
- Model evaluation (accuracy, confusion matrix, classification report)
Source:
- Rows: 3,333
- Features: 11 (including Churn, AccountWeeks, ContractRenewal, DataPlan, DataUsage, CustServCalls, DayMins, DayCalls, MonthlyCharge, OverageFee, RoamMins)
-
Data Preprocessing
- Handle missing values
- Encode categorical variables
- Scale features
-
Class Imbalance Handling
- Stratified train-test split
- Synthetic Minority Oversampling Technique (SMOTE)
-
Modeling
- Logistic Regression (with class_weight and/or SMOTE)
- Feature importance analysis
-
Evaluation
- Accuracy, Confusion Matrix, Classification Report
- Python 3.8+
- pandas
- numpy
- scikit-learn
- imbalanced-learn
- matplotlib
- seaborn
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License.
For questions or feedback, please open an issue or contact [[email protected]].
Empowering telecoms with data-driven retention strategies!