Using xgboost with `crankcompositor`/`responsecompositor` #366

fa1999abdi · 2024-02-06T09:13:57Z

fa1999abdi
Feb 6, 2024

Hello
I'm doing ML survival study using the {MLR3proba} package, and I'm using three learners, "surv.rfsrc", "surv.xgboost" and "surv.penalized". I want to predict survival time for each individual and compare my three learners(with RMSE and C-index criteria). Would you please explain how can I use {mlr3pipelines} and {distrcompositor, crankcompositor} to do that?
The following are my codes:

# create a task
tsk_s <- as_task_surv(tb, time = "time_to_death", event = "status", type = "right")

#impute missing
po = po("imputehist")

# new task
new_task = po$train(list(tsk_s= tsk_s))[[1]]

# benckmark
srfs=lrn("surv.rfsrc",predict_type = "crank",importance ="permute")
sbboost=lrn("surv.xgboost",predict_type = "crank")
spe=lrn("surv.penalized", lambda1=485.86,predict_type = "crank")
learners=list(srfs,sbboost,spe)
resample = rsmp("cv", folds = 3)
design = benchmark_grid(new_task, learners, resample)
bm = benchmark(design)
msr_txt = c("surv.cindex","surv.rmse")
bm$aggregate(measures)[, c("learner_id","task_id", ..msr_txt)]

^{Created on 2024-02-06 with reprex v2.1.0}

bblodfon · 2024-02-06T22:05:03Z

bblodfon
Feb 6, 2024
Maintainer

Hi, please consult the crankcompose docs. Practically you will do something like:

task = tsk("rats")
pipe = po("imputehist") %>>% 
           ppl("crankcompositor", learner = lrn(#whatever#), response = TRUE, method = "sum_haz")
pipe$train(task)
p = pipe$predict(task)[[1]] # p will have a response (survival time) now
p$score(#your_measures#)

But note that in general in survival analysis, there are issues when trying to compose the response from a distr prediction via different methods and surv.rmse is rarely used if at all. More common is to evaluate the whole distr with measures like the integrated survival brier score, ie surv.graf (docs) or surv.rcll, etc.

0 replies

bblodfon · 2024-02-08T13:16:32Z

bblodfon
Feb 8, 2024
Maintainer

@fa1999abdi question covered?

0 replies

fa1999abdi · 2024-02-08T15:07:13Z

fa1999abdi
Feb 8, 2024
Author

@bblodfon ,Thank you so much for your response.
but I should use distrcompositor for lrn("surv.xgboost") to predict survival time for each individual. is it correct?

0 replies

bblodfon · 2024-02-08T17:54:17Z

bblodfon
Feb 8, 2024
Maintainer

You should use distrcompositor with xgboost and the estimator = breslow for cox objective, see #263 . This will give you a distr prediction type. If you really want a response, crankcompositor works (but note some issues with improper distributions and how taking mean or median will not work as expected but with good reasoning behind that).

0 replies

bblodfon · 2024-02-09T07:52:17Z

bblodfon
Feb 9, 2024
Maintainer

@fa1999abdi I am going to soon split the xgboost objectives/learners (Cox vs AFT are very different) and for the Cox, the distr predictions will by default be generated using the breslow estimator to streamline things (so no distr-composition will be required for the XGboost-Cox learner). Of course response prediction will not be included, you will still need to compose that with the crankcompositor

0 replies

fa1999abdi · 2024-02-10T08:23:22Z

fa1999abdi
Feb 10, 2024
Author

but it didn't work

    tsk_s <- as_task_surv(tb, time = "time_to_death", event = "status", type = "right")
    pipe = po("imputehist") %>>% 
      ppl("crankcompositor", learner = lrn("surv.xgboost"), response = TRUE, method = "sum_haz")
    pipe$train(tsk_s)
p = pipe$predict(tsk_s)[[1]] # p will have a response (survival time) now

$compose_crank.output
NULL
>  p = pipe$predict(tsk_s)[[1]] # p will have a response (survival time) now
Error: Assertion on 'distr' failed: FALSE.
This happened PipeOp compose_crank's $predict()
`

0 replies

bblodfon · 2024-02-10T08:59:51Z

bblodfon
Feb 10, 2024
Maintainer

Yes, you need to estimate the distr either way (crankcompositor converts a distr to crank/response), so now it looks a bit complex but the following works:

library(mlr3proba)
#> Loading required package: mlr3
library(mlr3pipelines)
library(mlr3extralearners)

task = tsk("rats")

learner =
  po("encode", method = "treatment") %>>%
  ppl("crankcompositor",
    # crank needs a distr prediction type, xgboost doesn't have one, so we have to estimate it:
    learner = ppl("distrcompositor", learner = lrn("surv.xgboost", nrounds = 10),
                   estimator = "breslow", overwrite = FALSE),
    response = TRUE, method = "sum_haz", overwrite = FALSE) |>
  as_learner()

learner$train(task)
p = learner$predict(task)
p
#> <PredictionSurv> for 300 observations:
#>     row_ids time status      crank         lp response     distr
#>           1  101  FALSE -0.5318943 -0.5318943 3.987942 <list[1]>
#>           2   49   TRUE -0.9984229 -0.9984229 2.501140 <list[1]>
#>           3  104  FALSE -0.9984229 -0.9984229 2.501140 <list[1]>
#> ---                                                             
#>         298   92  FALSE -1.0661759 -1.0661759 2.337293 <list[1]>
#>         299  104  FALSE -0.8688244 -0.8688244 2.847226 <list[1]>
#>         300  102  FALSE -0.8688244 -0.8688244 2.847226 <list[1]>

p$score(msr("surv.cindex")) # uses lp prediction type
#> surv.cindex 
#>   0.8984875
p$score(msr("surv.rmse")) # uses response prediction type
#> surv.rmse 
#>  61.24336
p$score(msr("surv.brier")) # uses distr prediction type
#>  surv.graf 
#> 0.03333211

^{Created on 2024-02-10 with reprex v2.0.2}

0 replies

fa1999abdi · 2024-02-11T07:20:15Z

fa1999abdi
Feb 11, 2024
Author

@bblodfon thanks so much for your help.

0 replies

bblodfon · 2024-02-11T07:38:37Z

bblodfon
Feb 11, 2024
Maintainer

FYI, even though you can do the above and get a response (survival time), this is from Haider's paper (he introduced the D-calibration score), where he mentions why converting a distr to a single value response is not good practice for survival modeling:

0 replies

bblodfon · 2024-04-12T12:13:02Z

bblodfon
Apr 12, 2024
Maintainer

@fa1999abdi we now have the xgboost Cox learner with distr predictions by default (using the Breslow estimator) so you can simplify the pipeline above as:

library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3proba)
#> Loading required package: mlr3

task = tsk("rats")

learner =
  po("encode", method = "treatment") %>>%
  ppl("crankcompositor",
    learner = lrn("surv.xgboost.cox", nrounds = 10),
    response = TRUE, method = "sum_haz", overwrite = FALSE) |>
  as_learner()

learner$train(task)
p = learner$predict(task)
p
#> <PredictionSurv> for 300 observations:
#>     row_ids time status      crank         lp response     distr
#>           1  101  FALSE -0.5318943 -0.5318943 3.987942 <list[1]>
#>           2   49   TRUE -0.9984229 -0.9984229 2.501140 <list[1]>
#>           3  104  FALSE -0.9984229 -0.9984229 2.501140 <list[1]>
#> ---                                                             
#>         298   92  FALSE -1.0661759 -1.0661759 2.337293 <list[1]>
#>         299  104  FALSE -0.8688244 -0.8688244 2.847226 <list[1]>
#>         300  102  FALSE -0.8688244 -0.8688244 2.847226 <list[1]>

^{Created on 2024-04-12 with reprex v2.0.2}

0 replies

bblodfon · 2024-08-17T17:22:54Z

bblodfon
Aug 17, 2024
Maintainer

Hi @fa1999abdi, we now have a new pipeop (v0.6.7) to compose survival time with RMST from the survival distribution, see more details here.

An example using xgboost would be as follows:

library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3proba)
#> Loading required package: mlr3

task = tsk("lung")
xgb = lrn("surv.xgboost.cox", nrounds = 10)

grlrn = 
  po("encode", method = "treatment") %>>%
  ppl("responsecompositor", learner = xgb, method = "rmst") |>
  as_learner()

p = grlrn$train(task)$predict(task)
p
#> <PredictionSurv> for 168 observations:
#>     row_ids time status      crank         lp response     distr
#>           1  455   TRUE -0.6479679 -0.6479679 357.4828 <list[1]>
#>           2  210   TRUE  0.7804120  0.7804120 179.7284 <list[1]>
#>           3 1022  FALSE -2.2456281 -2.2456281 692.1024 <list[1]>
#> ---                                                             
#>         166  105  FALSE -0.1915968 -0.1915968 289.1797 <list[1]>
#>         167  174  FALSE -0.4396934 -0.4396934 324.7442 <list[1]>
#>         168  177  FALSE -0.6972836 -0.6972836 365.6514 <list[1]>

^{Created on 2024-08-17 with reprex v2.1.1}

0 replies

fa1999abdi · 2024-08-18T16:52:20Z

fa1999abdi
Aug 18, 2024
Author

Hi John, thanks for the update on the new pipeop!The new pipeop for composing survival time with RMST looks interesting. I also appreciate the example with xgboost—it will be helpful for my analyses. I tried to install version 0.6.7, but I'm having some issues and can't seem to get it to install properly. Have you experienced any similar problems, or do you have any suggestions on how to resolve them?

library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3proba)
#> Loading required package: mlr3
```

``` r
packageVersion("mlr3pipelines")
#> [1] '0.6.0'
```

``` r
remotes::install_version("mlr3pipelines", version = "0.6.7")
#> Error in download_version_url(package, version, repos, type): version '0.6.7' is invalid for package 'mlr3pipelines'
```

``` r

#> Loading required package: mlr3

task = tsk("lung")
xgb = lrn("surv.xgboost.cox", nrounds = 10)

grlrn = 
  po("encode", method = "treatment") %>>%
  ppl("responsecompositor", learner = xgb, method = "rmst") |>
  as_learner()
#> Error: Element with key 'responsecompositor' not found in DictionaryGraph!
```

``` r

p = grlrn$train(task)$predict(task)
#> Error in eval(expr, envir, enclos): object 'grlrn' not found
```

``` r
p
#> Error in eval(expr, envir, enclos): object 'p' not found
```

<sup>Created on 2024-08-18 with [reprex v2.1.0](https://reprex.tidyverse.org)</sup>
`

4 replies

bblodfon Aug 18, 2024
Maintainer

I meant mlr3proba at 0.6.7 the new pipeop is there, please install the latest version on github.

mlr3pipelines at 0.6.0 is at the current latest version already.

fa1999abdi Aug 19, 2024
Author

John, the latest version of mlr3proba on GitHub is 0.6.6.

bblodfon Aug 19, 2024
Maintainer

I haven't officially released the 0.6.7 yet, but it is 0.6.7!

fa1999abdi Aug 19, 2024
Author

Thank you for the clarification! I’ll keep an eye out for the official release of 0.6.7 when it’s available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using xgboost with `crankcompositor`/`responsecompositor` #366

{{title}}

Replies: 12 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Using xgboost with crankcompositor/responsecompositor #366

fa1999abdi Feb 6, 2024

Replies: 12 comments · 4 replies

bblodfon Feb 6, 2024 Maintainer

bblodfon Feb 8, 2024 Maintainer

fa1999abdi Feb 8, 2024 Author

bblodfon Feb 8, 2024 Maintainer

bblodfon Feb 9, 2024 Maintainer

fa1999abdi Feb 10, 2024 Author

bblodfon Feb 10, 2024 Maintainer

fa1999abdi Feb 11, 2024 Author

bblodfon Feb 11, 2024 Maintainer

bblodfon Apr 12, 2024 Maintainer

bblodfon Aug 17, 2024 Maintainer

fa1999abdi Aug 18, 2024 Author

bblodfon Aug 18, 2024 Maintainer

fa1999abdi Aug 19, 2024 Author

bblodfon Aug 19, 2024 Maintainer

fa1999abdi Aug 19, 2024 Author

Using xgboost with `crankcompositor`/`responsecompositor` #366

fa1999abdi
Feb 6, 2024

Replies: 12 comments 4 replies

bblodfon
Feb 6, 2024
Maintainer

bblodfon
Feb 8, 2024
Maintainer

fa1999abdi
Feb 8, 2024
Author

bblodfon
Feb 8, 2024
Maintainer

bblodfon
Feb 9, 2024
Maintainer

fa1999abdi
Feb 10, 2024
Author

bblodfon
Feb 10, 2024
Maintainer

fa1999abdi
Feb 11, 2024
Author

bblodfon
Feb 11, 2024
Maintainer

bblodfon
Apr 12, 2024
Maintainer

bblodfon
Aug 17, 2024
Maintainer

fa1999abdi
Aug 18, 2024
Author

bblodfon Aug 18, 2024
Maintainer

fa1999abdi Aug 19, 2024
Author

bblodfon Aug 19, 2024
Maintainer

fa1999abdi Aug 19, 2024
Author