Skip to content

Latest commit

 

History

History
89 lines (55 loc) · 4.72 KB

README.md

File metadata and controls

89 lines (55 loc) · 4.72 KB

HR-Analysis

Objective:

To develop predictive analysis for identifying the employees most likely to get promoted based on various factors such as training performance, KPI completion etc.

Key Skills:

Data collection and preprocessing using Pandas Exploratory data analysis(EDA) using Matplotlib and Seaborn Feature engineering Model building and evaluation using Scikit-learn

Step-by-step guide:

1.Data collection & cleaning: Using Pandas 'read_csv' to collect data from training and testing csv files. Screenshot 2024-07-11 192201 Screenshot 2024-07-11 192223

2.Descriptive statistics: Using .describe() to get information about statistical measures like max, min, average etc. Screenshot 2024-07-11 192259

3.Data exploration: Employing countplot, displot, histograms from seaborn, matplotlib libraries for various graphical insights about the datasets.

Count of employees who got promoted: Screenshot 2024-07-11 192555

Count of employees who got promoted wrt to education: Screenshot 2024-07-11 192604

Count of employees who got promoted wrt to age: Screenshot 2024-07-11 192636

Count of employees who got promoted wrt to previous_year_rating: Screenshot 2024-07-11 192656

Count of employees who got promoted wrt to age & length of service: Screenshot 2024-07-11 192721 Screenshot 2024-07-11 192734

Scatter plot for dataset exploration: Screenshot 2024-07-11 192749 Screenshot 2024-07-11 192832

4.Label conversion for categorical data attributes using LabelEncoder from preprocessing module: Screenshot 2024-07-11 192901 Screenshot 2024-07-11 192917

5.Correlation: Analyzing inter-dependency between different attributes, here KPI's, award's won & avg_training_score attributes have positive correlation thus having high impact on target variable('is_promoted') Screenshot 2024-07-11 192936 Screenshot 2024-07-11 192950 Screenshot 2024-07-11 193035

6.Splitting the data: Screenshot 2024-07-11 193100

7.XGBoost-classifier: Screenshot 2024-07-11 193115 Screenshot 2024-07-11 193147 Screenshot 2024-07-11 193159

8.RandomForest: Screenshot 2024-07-11 193217 Screenshot 2024-07-11 193230 Accuracy is not a good parameter for classification models, here focus is on recall or f1-score to make it close to 1.0 Screenshot 2024-07-11 203813

9.Gradient boosting model(GBM): Screenshot 2024-07-11 203825

10.Predictions for target variable by different models: Screenshot 2024-07-11 203843