supervised-learning-finding-donors

Jan 2, 2019

3efecd6 · Jan 2, 2019

Name	Name	Last commit message	Last commit date
parent directory ..
LICENSE	LICENSE	Adding license file	Nov 11, 2018
README.md	README.md	Update README.md	Jan 2, 2019
census.csv	census.csv	Initial commit	Nov 6, 2018
finding_donors.ipynb	finding_donors.ipynb	Initial commit	Nov 6, 2018
visuals.py	visuals.py	Initial commit	Nov 6, 2018

README.md

Supervised Learning

Project: Finding Donors for CharityML

Install

This project requires Python 3.x and the following Python libraries installed:

You will also need to have software installed to run and execute an iPython Notebook

Code

Code is provided in the finding_donors.ipynb notebook file. The notebook uses visuals.py Python file and the census.csv dataset file. Three models were initially selected for analysis: Linear SVC, Decision Tree Clasifier and Gradient Boosting Classifier. Out of 3 latter model was selected as the most promising.

Run

In a terminal or command window, navigate to the top-level project directory finding_donors/ (that contains this README) and run one of the following commands:

ipython notebook finding_donors.ipynb

or

jupyter notebook finding_donors.ipynb

This will open the iPython Notebook software and project file in your browser.

Overview

The notebook covers supervised learning techniques applied on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate. Data was analyzed, series of transformations and pre-processing steps applied to manipulate the data into a workable format. Linear SVC, Decision Tree Classifier and Gradient Boosting Classifier (GBC) sklearn models were evaluated so to find best solution. Based on the evaluation results sklearn GBC was selected as the most promising. The model was optimized using sklearn grid search. Additionally features importance was analyzed, the importance of each feature when making predictions based on the chosen algorithm.

Data

The modified census dataset consists of approximately 32,000 data points, with each datapoint having 13 features. This dataset is a modified version of the dataset published in the paper "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", by Ron Kohavi. You may find this paper online, with the original dataset hosted on UCI.

Features

age: Age
workclass: Working Class (Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked)
education_level: Level of Education (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool)
education-num: Number of educational years completed
marital-status: Marital status (Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse)
occupation: Work Occupation (Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces)
relationship: Relationship Status (Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried)
race: Race (White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black)
sex: Sex (Female, Male)
capital-gain: Monetary Capital Gains
capital-loss: Monetary Capital Losses
hours-per-week: Average Hours Per Week Worked
native-country: Native Country (United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands)

Target Variable

income: Income Class (<=50K, >50K)

Results

Optimized model of Gradient Boosting Classifier is giving accurracy as 0.8689 and F-score (beta 0.5) as 0.7483 on test set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

supervised-learning-finding-donors

supervised-learning-finding-donors

README.md

Supervised Learning

Project: Finding Donors for CharityML

Install

Code

Run

Overview

Data

Results

Files

supervised-learning-finding-donors

Directory actions

More options

Directory actions

More options

Latest commit

History

supervised-learning-finding-donors

Folders and files

parent directory

README.md

Supervised Learning

Project: Finding Donors for CharityML

Install

Code

Run

Overview

Data

Results