Skip to content

Diabetes prediction using a KNN model and Pima Indian Diabetes Dataset

Notifications You must be signed in to change notification settings

pronzzz/diabetes-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Diabetes Prediction using Machine Learning 📊

  • Pandas, NumPy, Matplotlib, and Seaborn are used for data manipulation and visualization 📈

  • Data Cleaning 🧹

    • Missing Values: No missing values 😊
    • Duplicated Values: None found 👍
  • Exploratory Data Analysis 🔎

    • Outcome Distribution: Balanced ⚖️
    • Outlier Detection: A few in some features 👀
  • Data Preprocessing 🛠️

    • Standard Scaling: Applied for better model performance 🎚️
    • Label Encoding: Outcome variable (0/1) 🎯
  • Model Training and Evaluation 🚂

    • K-Nearest Neighbors Classifier 🤝
    • Grid Search for Optimal Hyperparameter (k) 🔧
    • Max Train Score: 94.23% at k = 14 🥇
    • Max Test Score: 88.89% at k = 13 🏆
  • Performance Evaluation 📊

    • Confusion Matrix: Shows model's prediction accuracy ⬜⬛
    • Classification Report: Detailed metrics (Precision, Recall, F1-score) 👌
  • Conclusion 🏁

    • KNN Classifier with k = 13 provides the best balance between train and test scores 🏆
    • Model can effectively predict diabetes based on patient's features 😄