Data Analysis on the cancer problem dataset using the Habermans Data
Habermans data - Haberman’s data set contains data from the study conducted in University of Chicago’s Billings Hospital between year 1958 to 1970 for the patients who undergone surgery of breast cancer.
Dataset reference : https://www.kaggle.com/gilsousa/habermans-survival-data-set
Problems Explored and tackled:
- High level statistics of the dataset: number of points, numer of features, number of classes, data-points per class.
- Perform Univaraite analysis(PDF, CDF, Boxplot, Voilin plots) to understand which features are useful towards classification.
- Perform Bi-variate analysis (scatter plots, pair-plots) to see if combinations of features are useful in classfication.
- Write your observations in english as crisply and unambigously as possible. Always quantify your results.
Conclusion of the Data Analysis can be found on the Notebook itself. Readable Jupyter Notebook present. Prince Rana