Skip to content

In this project I will look at a dataset of patient data relating to breast cancer, and develop a machine learning model that will aim to predict Malignant tumors with the highest accuracy.

Notifications You must be signed in to change notification settings

pranath/breast_cancer_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Breast cancer prediction

Introduction

In this project I will look at a dataset of patient data relating to breast cancer, which is available on Kaggle as the Wisconsin Breast Cancer dataset.

The dataset features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image in the 3-dimensional space is that described in: K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34.

The dataset was released in November 1995 and the original source can be found here.

An example of images of cells that this data comes from of both malignant and benign tumors can be seen below.

title

I will develop a machine learning model that will aim to predict Malignant tumors with the highest accuracy.

Results

  1. In the first project finished in July 2019, the best result was an overall F1 score on all categories of 0.99
  2. In the latest project finished in December 2020, the best result was an overall F1 score on all categories of 0.96. Despite this being a lower score than the first project, this is considered to be a more relaiable estimate of model performance due to the use of more advanced validation techniques. New techniques used in this latest project include: More statisitcal methods, UMAP dimensionality reduction, and the XGBoost model.

About

In this project I will look at a dataset of patient data relating to breast cancer, and develop a machine learning model that will aim to predict Malignant tumors with the highest accuracy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published