Skip to content

Gavin2318/PDPbox

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDPbox

python partial dependence plot toolbox

This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm. (now support all scikit-learn algorithms)

The common problem

When using black box machine learning algorithms like random forest and boosting, it is hard to understand the relations between predictors and model outcome. For example, in terms of random forest, all we get is the feature importance. Although we can know which feature is significantly influencing the outcome based on the importance calculation, it really sucks that we don’t know in which direction it is influencing. And in most of the real cases, the effect is non-monotonic. We need some powerful tools to help understanding the complex relations between predictors and model prediction.
PDPbox aims to wrap up and enrich some useful functions mentioned in ICEbox in Python.

Highlight

  1. Support one-hot encoding features.
  2. For numeric features, create grids with percentile points.
  3. Directly handle multiclass classifier.
  4. Support two variable interaction plot.
  5. Support actual prediction plot. (new)

Documentation

For details about the ideas, please refer to Introducing PDPbox.
For description about the functions and parameters, please refer to PDPbox functions and parameters.
For test and demo, please refer to https://github.com/SauceCat/PDPbox/tree/master/test.

Install PDPbox

git clone https://github.com/SauceCat/PDPbox.git
cd PDPbox
python setup.py install

Examples

Binary feature: single variable plot with original points and individual lines

Binary feature: single variable plot with clustered individual lines

Binary feature: actual predictions plot for a single variable

Numeric feature: single variable plot with x_quantile=True, original points and individual lines

Numeric feature: single variable plot with percentile_range=(5, 95)

Numeric feature: single variable plot with customized grid points

Numeric feature: actual predictions plot for a single variable

Onehot encoding feature: single variable plot with individual lines and original points

Onehot encoding feature: single variable plot without centering the lines

Onehot encoding feature: actual predictions plot for a single variable

Multiclass: single variable plot with individual lines and original points

Interaction between two variables: the complete plot

Interaction between two variables: multiclass with only contour plots

About

python partial dependence plot toolbox

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.5%
  • Python 0.5%