-
Pandas, NumPy, Matplotlib, and Seaborn are used for data manipulation and visualization 📈
-
Data Cleaning 🧹
- Missing Values: No missing values 😊
- Duplicated Values: None found 👍
-
Exploratory Data Analysis 🔎
- Outcome Distribution: Balanced ⚖️
- Outlier Detection: A few in some features 👀
-
Data Preprocessing 🛠️
- Standard Scaling: Applied for better model performance 🎚️
- Label Encoding: Outcome variable (0/1) 🎯
-
Model Training and Evaluation 🚂
- K-Nearest Neighbors Classifier 🤝
- Grid Search for Optimal Hyperparameter (k) 🔧
- Max Train Score: 94.23% at k = 14 🥇
- Max Test Score: 88.89% at k = 13 🏆
-
Performance Evaluation 📊
- Confusion Matrix: Shows model's prediction accuracy ⬜⬛
- Classification Report: Detailed metrics (Precision, Recall, F1-score) 👌
-
Conclusion 🏁
- KNN Classifier with k = 13 provides the best balance between train and test scores 🏆
- Model can effectively predict diabetes based on patient's features 😄
-
Notifications
You must be signed in to change notification settings - Fork 0
pronzzz/diabetes-prediction
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Diabetes prediction using a KNN model and Pima Indian Diabetes Dataset
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published