Stroke prediction project In collaboration with my team member, Leen Abo Alhija'a, we developed a Python machine learning code that aims to predict strokes based on some features like age, gender, hypertension, heart disease and so on.
In our project, we implemented various preprocessing techniques. We handled the null values and removed columns that were deemed irrelevant to our stroke prediction task.
To handle categorical features, we employed the 'get_dummies' method, which transformed them into numerical representations. We also renamed some features, making them easier to understand.
After preprocessing the data, we tested several machine learning models, including:
- Logistic Regression
- Decision Tree
- Random Forest
- K-Nearest Neighbors
- Support Vector Machine (SVM)
- Gaussian Naive Bayes
Through multiple submissions, we found that the Gaussian Naive Bayes model achieved the highest score among the evaluated models.
Our final evaluation score for the project was 68%, which is good considering the highest score achieved which was 77%, and the dataset size.