Using the WHO data, predicting whether a patient is likely to get a stroke on input parameters like gender, age, various diseases, smoking status, and others. Working with the imbalanced dataset.
- Trained a decision tree, random forest, and gradient boosting algorithm to set a baseline performance. Used ROC AUC and average precision metrics for model evaluation. Also showing precision and recall for both the classes.
- Now we balanced the dataset using different techniques-random undersampling, random oversampling, edited nearest neighbors and SMOTE.
- And again train the model with a decision tree, random forest, and gradient boosting algorithm and compare the model performance from before.
- Finding out important features using random forest and gradient boosting algorithm and how the different data balancing techniques improved the performance of the model.