Skip to content

Data Analysis on the cancer problem dataset using the Habermans Data

Notifications You must be signed in to change notification settings

RanaPrince/Exploratory-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Exploratory-Data-Analysis

Data Analysis on the cancer problem dataset using the Habermans Data

Habermans data - Haberman’s data set contains data from the study conducted in University of Chicago’s Billings Hospital between year 1958 to 1970 for the patients who undergone surgery of breast cancer.

Dataset reference : https://www.kaggle.com/gilsousa/habermans-survival-data-set

Problems Explored and tackled:

  • High level statistics of the dataset: number of points, numer of features, number of classes, data-points per class.
  • Perform Univaraite analysis(PDF, CDF, Boxplot, Voilin plots) to understand which features are useful towards classification.
  • Perform Bi-variate analysis (scatter plots, pair-plots) to see if combinations of features are useful in classfication.
  • Write your observations in english as crisply and unambigously as possible. Always quantify your results.

Conclusion of the Data Analysis can be found on the Notebook itself. Readable Jupyter Notebook present. Prince Rana

About

Data Analysis on the cancer problem dataset using the Habermans Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published