Skip to content

Decision tree implementation using the minimum description length principle for inference and pruning.

Notifications You must be signed in to change notification settings

FrankRuis/MDLP-Decision-Tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

MDLP-Decision-Tree

Based on a paper by Quinlan and Rivest, “Inferring decision trees using the minimum description lenght principle”. In: Information and computation 80.3 (1989), pp. 227–248.

The main idea is that trying to transmit a theory for how data is classified and the exceptions for that theory using as few bits as possible will result in a theory that will also accurately classify new data. Because if an attribute’s relationship with the object’s class is significant enough to save bits during transmission, it will most likely be a worthwhile addition to a decision tree.

Including the theory in the cost of the transmission takes care of the overfitting problem, because a more complex theory that might classify the training data perfectly by assigning a leaf node to each training object will result in a relatively high theory cost. On the other hand, a too simple theory will result in a high cost for encoding all of the exceptions.

Usage

from mdlp import Tree

train = # list of feature arrays with the label as the last element
test = # list of feature arrays with the label as the last element
tree = Tree(train)
predictions  = [tree.classify(t) for  t  in  test]

About

Decision tree implementation using the minimum description length principle for inference and pruning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages