Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior with small numbers of trees #638

Open
djhwolters opened this issue Sep 8, 2022 · 1 comment
Open

Behavior with small numbers of trees #638

djhwolters opened this issue Sep 8, 2022 · 1 comment

Comments

@djhwolters
Copy link

I am using ranger to predict threshold exceedance probabilities using QRF. I am puzzled by the behaviour with small numbers of trees. With for example 100 trees, I get nice continuous probabilities. But for 5 trees, it predicts either 0, 0.2, 0.4, 0.6, 0.8 or 1.0, and for 3 trees 0, 0.5 or 1. It looks like the individual trees predict deterministically, and the probabilities are derived by combining these deterministic trees. But for QRF as I understand, the probability distribution should come from the ecdf of all data points in the terminal nodes, so even for 1 tree, many different probabilites should be possible. Any ideas on this?

Code that shows this behaviour:

`library(ranger)

#load data set
data(mtcars)

#extract 1 column as test forecast row
xtest=mtcars[3,]
mtcars=mtcars[-c(3)]

#fit model using 50 trees
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=50,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)

#fit model using 3 trees
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=3,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)

#fit model using 1 tree
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=1,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)`

@mnwright
Copy link
Member

mnwright commented Oct 4, 2022

The reason for this is the speedup discussed here: lorismichel/quantregForest#3.

Maybe we should add a note to the documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants