Skip to content

A comparison of the accuracy of various classification models using Wisconsin breast cancer data

Notifications You must be signed in to change notification settings

rrasheed/Breast-Cancer-Diagnosis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Data-Mining-Final-Project

A comparison of the accuracy of various classification models using breast cancer data

Overiview

This project was completed as a final project for the DATS 6103 Data Mining course taught by Dr. Yuxiao Huang from The George Washington University Data Science Program. The project use breast cancer classification data to compare the accuracy of various classifiers: Perceptron, Logistic Regression, Decision Tree, and Random Forest

Prerequisites

Python 3.6.5 Anaconda Distribution -- Jupyter Notebook

$ conda install seaborn

$ conda install graphviz

(Optional) $ conda install watermark

Running Code

Upload Breast_Cancer_Classification.ipynb onto the Jupyter Notebook. The code already has a link to the data built in. All that is needed is to "Run All"

Data

The data was provded by Dr. William H. Wolberg from the University of Wisconsin General Surgery Dept. as well as W. Nick Street, and Olvi L. Mangasarian from the University of Wisconsin Computer Science Dept. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. Then each sample is put under a class label of either Malignant (M) or Benign (B)

Link for Data: https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data

Attribute information:

  1. ID number
  2. Diagnosis (M = malignant, B = benign)

3-32) Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry 
j) fractal dimension ("coastline approximation" - 1)

The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features

Sources

[1] Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print.

[2] UCI Machine Learning Repository: Center for Machine Learning and Intelligent Systems. Breast cancer wisconsin (diagnostic) data set: wdbc.data. http://archive.ics.uci.edu/ml/ machine-learning-databases/breast-cancer-wisconsin/wpbc.data, 1996.

[3] UCI Machine Learning Repository: Center for Machine Learning and Intelligent Systems. Breast cancer wisconsin (diagnostic) data set: wdbc.names. http://archive.ics.uci.edu/ml/ machine-learning-databases/breast-cancer-wisconsin/wdbc.names, 1996.

About

A comparison of the accuracy of various classification models using Wisconsin breast cancer data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages