Skip to content

Detailed analysis, using both predictive as well as descriptive approaches, on a diabetes dataset from Keggle

License

Notifications You must be signed in to change notification settings

dahjan/Diabetes-Dataset--Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diabetes Dataset - Detailed Analysis

This repository contains a detailed analysis of the Pima Indians Diabetes Database found on kaggle. Both predictive and descriptive analyses were performed, using various algorithms and information about Diabetes found in papers online. The document will be updated frequently, in order to implement new algorithms or ideas; thus, it can be viewed as a proof of principle of sorts.

Content

  • diabetes.csv files contains
    • 8 medical predictor factors: pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function and age
    • One target variable: outcome
    • Data from 768 female patients
  • *.ipynb files are Jupyter notebooks that document the research
  • utils.py contains all functions used for analysis
  • environment.yml used to create a conda environment

Jupyter notebooks

  • Report: main analysis and discussion

To see the notebooks, run jupyter notebook from the root directory of the project.

Acknowledgements

Special thanks to the Takeda Data Challenge, which took place in June 2018; it inspired me to work on this dataset extensively, and helped me greatly in finding my strenghts and weaknesses.

About

Detailed analysis, using both predictive as well as descriptive approaches, on a diabetes dataset from Keggle

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published