Diabetes_Prediction_ML

I test various models to test which is the best for predicting diabetes for the given dataset (Patient is diabetic or not).

I extracted the 2 datasets from another github repo. (forgot it's name), they are in the "datasets" folder. In my code, I combined them as they are basically different features for the same patients. I have gone through much detail to explain all the steps I have taken in the notebook itself.

I have used 2 approaches -

The dataset is imbalanced, I didn't balance it for approach 1 and used Logit Model to calculate statistically relevant features
I balanced the dataset, and then used RFE (using Logistic Regression) instead of Logit model as it performed relatively better.

I compared every result provided by each approach. I applied RandomForest, ADABoost, DecisionTrees, NaiveBayes, MLP and SVM.

Here are the comparisons :

Approach 1

Precision Scores:

Recall Scores:

F1 Scores:

Accuracy Scores:

Approach 2

Precision Scores:

Recall Scores:

F1 Scores:

Accuracy Scores:

Cross Validation Scores (Approach 2) :

In the end, I attained ~90% accuracy by RandomForest in Approach 2.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Comparisons		Comparisons
Datasets		Datasets
.DS_Store		.DS_Store
ML_Diabetes_Pred.ipynb		ML_Diabetes_Pred.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes_Prediction_ML

Approach 1

Approach 2

About

Releases

Packages

Languages

GaganMalik025/Diabetes_Prediction_ML

Folders and files

Latest commit

History

Repository files navigation

Diabetes_Prediction_ML

Approach 1

Approach 2

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages