python partial dependence plot toolbox
This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm. (now support all scikit-learn algorithms)
When using black box machine learning algorithms like random forest and boosting, it is hard to understand the relations between predictors and model outcome. For example, in terms of random forest, all we get is the feature importance. Although we can know which feature is significantly influencing the outcome based on the importance calculation, it really sucks that we don’t know in which direction it is influencing. And in most of the real cases, the effect is non-monotonic. We need some powerful tools to help understanding the complex relations between predictors and model prediction.
PDPbox aims to wrap up and enrich some useful functions mentioned in ICEbox in Python.
- Support one-hot encoding features.
- For numeric features, create grids with percentile points.
- Directly handle multiclass classifier.
- Support two variable interaction plot.
- Support actual prediction plot. (new)
For details about the ideas, please refer to Introducing PDPbox.
For description about the functions and parameters, please refer to PDPbox functions and parameters.
For test and demo, please refer to https://github.com/SauceCat/PDPbox/tree/master/test.
git clone https://github.com/SauceCat/PDPbox.git
cd PDPbox
python setup.py install








