Diabetes Dataset - Detailed Analysis

This repository contains a detailed analysis of the Pima Indians Diabetes Database found on kaggle. Both predictive and descriptive analyses were performed, using various algorithms and information about Diabetes found in papers online. The document will be updated frequently, in order to implement new algorithms or ideas; thus, it can be viewed as a proof of principle of sorts.

Content

diabetes.csv files contains
- 8 medical predictor factors: pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function and age
- One target variable: outcome
- Data from 768 female patients
*.ipynb files are Jupyter notebooks that document the research
utils.py contains all functions used for analysis
environment.yml used to create a conda environment

Jupyter notebooks

Report: main analysis and discussion

To see the notebooks, run jupyter notebook from the root directory of the project.

Acknowledgements

Special thanks to the Takeda Data Challenge, which took place in June 2018; it inspired me to work on this dataset extensively, and helped me greatly in finding my strenghts and weaknesses.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
Report.ipynb		Report.ipynb
Report.pdf		Report.pdf
diabetes.csv		diabetes.csv
environment.yml		environment.yml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes Dataset - Detailed Analysis

Content

Jupyter notebooks

Acknowledgements

About

Releases

Packages

Languages

License

dahjan/Diabetes-Dataset--Analysis

Folders and files

Latest commit

History

Repository files navigation

Diabetes Dataset - Detailed Analysis

Content

Jupyter notebooks

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages