HR ANALYTICS; Predicting Employee Promotions

Project Overview

This project applies predictive analytics to streamline the promotion process in a large multinational corporation (MNC) with multiple organizational verticals. Currently, identifying candidates for promotion relies heavily on manual HR evaluations and KPI assessments, which are time-consuming and delay decision-making. By building a predictive model, this project aims to help the HR department proactively identify employees eligible for promotion, accelerating their career progression and enhancing HR efficiency.

Business Understanding

Business Problem

The promotion process for managerial roles and below is manual and labor-intensive, resulting in delays that affect both employee morale and organizational agility. This project develops a predictive model using demographic and performance data to assist HR in making more efficient, data-driven promotion decisions.

Stakeholders

HR Department: Primary users of the promotion prediction model.
Department Heads and Team Managers: To understand promotion potential within their teams.
Executive Leadership: For strategic planning and talent management.
Data Science/IT Team: Responsible for maintaining and integrating the model within HR systems.

Data Description

The project uses two datasets:

Train Dataset: Contains information on current employees and a target variable indicating if they were promoted.
Test Dataset: Similar data without the target label, used for model evaluation.

Key Features

employee_id: Unique identifier.
department: Employee's department.
region: Location of the employee’s role.
education: Highest level of education.
gender: Gender of the employee.
recruitment_channel: Method of recruitment.
no_of_trainings: Number of trainings completed in the last year.
age: Employee's age.
previous_year_rating: Performance rating for the prior year.
length_of_service: Tenure in the organization.
KPIs_met >80%: Indicator of KPI completion over 80%.
awards_won?: Indicator of any awards received in the past year.
avg_training_score: Average score in recent training evaluations.
is_promoted: Target variable indicating promotion recommendation (1 for promoted, 0 otherwise).

Objectives

Main Objectives

To improve the accuracy and efficiency of the promotion process by building a machine learning model to identify employees most likely to be promoted.

Key Business Questions

Which employees are most likely to be promoted in the next cycle?
What are the primary factors influencing promotion decisions?
Are there biases in promotion practices that need addressing?
How can training programs be adjusted to boost promotion readiness?
How does the likelihood of promotion affect employee retention?

Methodology

Exploratory Data Analysis (EDA)

In the notebook, EDA uses visualizations and correlation analyses to explore distributions and relationships among features, identifying trends that influence promotion, such as previous_year_rating and avg_training_score. This analysis informs feature engineering and data cleaning steps.

Feature Engineering

To enhance model accuracy and interpretability, we engineered new features such as age and service groups, Performance Index, and Career Velocity, capturing patterns in promotion eligibility. We developed composite features like Relative Experience and Training Frequency, providing insights into employee growth and learning. Redundant and sensitive columns were removed to reduce bias, and statistical testing and multicollinearity checks ensured only significant, unique predictors were retained. These data preparation steps created a refined dataset that strengthened the model's ability to identify promotion-eligible employees accurately.

# Creating the relative_experience feature. 
df['relative_experience'] = df['age'] / (df['length_of_service'] + 1)

Data Preprocessing

The notebook preprocesses data by imputing missing values, encoding categorical features, and scaling numerical features. These steps create a structured dataset ready for model training, ensuring consistency and compatibility for machine learning.

Modelling

Model Selection and Training

A variety of models were tested, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, XGBoost, LightGBM, and Support Vector Classifier (SVC). Ensemble methods like Stacking Classifier and Balanced Bagging were also explored.

Hyperparameter Tuning

Hyperparameter tuning was performed across several algorithms, with a specific focus on enhancing the F1 score. By fine-tuning parameters for LightGBM, we optimized its performance, resulting in the highest F1 score observed.
Other models, such as XGBClassifier and Gradient Boosting, achieved F1 scores of 48.77% and 47.39% respectively, but did not match the optimized performance of the tuned LightGBM model.

Ensemble and Stacking Models

The Stacking Classifier achieved a balanced F1 score of 46.78% with high accuracy, providing reliable predictions across both precision and recall metrics.

Evaluation

The F1 score, which balances precision and recall, was chosen to ensure the model accurately identifies promotion-eligible employees while minimizing false positives and negatives. The tuned LightGBM model emerged as the top performer with an F1 score of 90.19%, making it the most effective model for HR’s goal of accurate promotion predictions. This focus on F1 optimization has led to a model that balances accuracy with equitable, data-driven promotion recommendations.

Conclusion

Findings

The data highlights a significant imbalance, with a small percentage of employees eligible for promotion.
Extensive feature engineering and careful feature selection was done to boost model performance.
Initial Model Performance and Tuning Needs: Early model tests indicated low F1 scores, highlighting the need to balance precision (identifying true promotion candidates) and recall (minimizing false negatives).

Recommendations

Explore Advanced Feature Engineering for Richer Insights; adding additional features, such as metrics related to leadership initiatives, cross-departmental collaborations, or innovative contributions, could provide a deeper view into promotion readiness. This would allow us to recognize contributions that align with our company’s values and priorities, giving HR teams a broader context when making promotion decisions.
Establish Regular Model Audits; To ensure our model remains accurate and aligned with evolving organizational needs, we suggest scheduling quarterly evaluations and updates. These ongoing reviews will help our model adapt to workforce changes and emerging business objectives, giving stakeholders assurance that promotion processes remain fair and aligned with company goals.
Implement Techniques to Address Data Imbalance
Explore Advanced Hyper-Parameter Tuning.

Future Work

Future improvements may include fine-tuning the model, implementing bias mitigation strategies, and exploring more complex algorithms.

Installation

To set up and run this project, follow these steps:

Clone the repository:
Clone the project repository from GitHub to your local machine.

git clone https://github.com/bedankibunja/HR-ANALYTICS-Predicting-Employee-Promotions.git

Navigate to the project directory:
Change to the project directory to access the files.
```
cd HR-ANALYTICS-Predicting-Employee-Promotions
```
Install the required dependencies:
Install all necessary packages listed in the requirements.txt file.
```
pip install -r requirements.txt
```

Usage

To explore the analysis and reproduce the results, follow these steps:

Open the Jupyter Notebook:

Launch the Jupyter Notebook environment and open the file Notebook.ipynb located in the repository.
Follow the Instructions:

The notebook contains step-by-step instructions for data loading, preprocessing, feature engineering, model training, and evaluation.
Execute Code Cells:

Run each code cell in sequence to perform the full analysis and view results. You can also modify parameters or experiment with additional features as desired.

To open the notebook, run:

jupyter notebook Notebook.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
Images		Images
my_dir/tabtransformer_tuning		my_dir/tabtransformer_tuning
Lgbm_Tuned_model.pkl		Lgbm_Tuned_model.pkl
Notebook.ipynb		Notebook.ipynb
Presentation.pdf		Presentation.pdf
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt
scaler.pkl		scaler.pkl
streamlit.py		streamlit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HR ANALYTICS; Predicting Employee Promotions

Table of Contents

Project Overview

Business Understanding

Business Problem

Stakeholders

Data Description

Key Features

Objectives

Main Objectives

Key Business Questions

Methodology

Exploratory Data Analysis (EDA)

Feature Engineering

Data Preprocessing

Modelling

Model Selection and Training

Hyperparameter Tuning

Ensemble and Stacking Models

Evaluation

Conclusion

Findings

Recommendations

Future Work

Installation

Usage

Contributors

About

Releases

Packages

Contributors 4

Languages

bedankibunja/HR-ANALYTICS-Predicting-Employee-Promotions

Folders and files

Latest commit

History

Repository files navigation

HR ANALYTICS; Predicting Employee Promotions

Table of Contents

Project Overview

Business Understanding

Business Problem

Stakeholders

Data Description

Key Features

Objectives

Main Objectives

Key Business Questions

Methodology

Exploratory Data Analysis (EDA)

Feature Engineering

Data Preprocessing

Modelling

Model Selection and Training

Hyperparameter Tuning

Ensemble and Stacking Models

Evaluation

Conclusion

Findings

Recommendations

Future Work

Installation

Usage

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages