Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Combine multiple survfits #212

Open
DanChaltiel opened this issue May 17, 2024 · 4 comments
Open

Feature request: Combine multiple survfits #212

DanChaltiel opened this issue May 17, 2024 · 4 comments

Comments

@DanChaltiel
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When I'm only describing the survival of a population (i.e. with no grouping variable), I would like to be able to show both OS and PFS on the same graph.

Describe the solution you'd like
The solution is described in kassambara/survminer#195.

Describe alternatives you've considered
Alternative solutions would be (1) to use survminer, but the project is unfortunately not maintained anymore and I like the interface of ggsurvfit better, or (2) to plot both graphs separately.

Additional context
Here is a screenshot of the above-mentioned issue:
image

@ddsjoberg
Copy link
Collaborator

Hi @DanChaltiel , thanks for the post!

I think this is something that people would need, so we should think on the best way to implement.

Certainly, the easiest way for me to maintain is ask the users to do some pre-processing and create a data frame that is one line per subject per outcome. The data frame would include a column indicating which outcome the row is associated with. We can then pass that data frame to survfit(), where we stratify the figures by outcome and we get what we want. Then there are zero updates need to the core functionality in the package.

Maybe we simply add a helper function to do this pre-processing? Minimally, we can add an example to the gallery showing how to do the pre-processing.

What do you think?

library(ggsurvfit)
#> Loading required package: ggplot2
packageVersion("ggsurvfit")
#> [1] '1.1.0'
# create a stacked data frame with 2 outcomes
stacked_adtte <-
  adtte |> 
  dplyr::select(USUBJID, PARAM, AVAL, CNSR) %>%
  dplyr::bind_rows(
    .,
    dplyr::mutate(
      ., 
      AVAL = AVAL / (1 + runif(dplyr::n())),
      PARAM = "Overall survival (years)"
    )
  )


survfit2(Surv_CNSR(AVAL, CNSR) ~ PARAM, data = stacked_adtte) |> 
  ggsurvfit() +
  add_confidence_interval() +
  add_risktable() +
  scale_ggsurvfit()

Created on 2024-05-17 with reprex v2.1.0

@DanChaltiel
Copy link
Contributor Author

Oh you are right, you can simply stack the data!
Indeed, a helper function would be enough, thanks for considering the FR :-)

I'm not sure about how the interface should look like though.
In a real case, there are several couples time/event in a single dataframe, so Surv(AVAL, CNSR) in your example might not always make sense.
Maybe the helper could have some time_to="stacked_time" and event_to="stacked_event" arguments that specify the names the used should put in Surv.

In the end, you would input your couples somehow like this:

survs = list(
  "Overall survival (years)"=c("os.time", "os.status"),
  "Progression-free survival (years)"=c("pfs.time", "pfs.status")
)
stacked_adtte = stack_surv(adtte, survs=survs)

@ddsjoberg
Copy link
Collaborator

Couple of things to consider:

  1. The Surv() function has arguments c(time, time2, event). We probably want to handle all of these.
  2. This will be complicated with competing risks models, as the event is a factor. For example, a cancer recurrence outcome may have factor levels, c("censored", "recurrence", "death from other causes"), and a death from cancer outcome would have levels c("censored", "death from cancer", "death from other causes"). We'd some way to reconcile these varying levels.

Anyway, I'll leave this issue open for now. But to be honest, no plans to implement in the immediate future (just to set expectations!)

@DanChaltiel
Copy link
Contributor Author

No problem, this is perfectly understandable!
In the meantime, here is the function that I'll be using to handle the issue.
For point (2) I'm not very familiar with competing risk visualization, but if it is only a problem of binding multiple factors, maybe adding stacked_event = as.character(stacked_event) in the mutate calls would be enough.

library(tidyverse)
library(ggsurvfit)

stack_surv = function(data, surv_list){
  surv_list %>% 
    imap(~{
      if(length(.x)==2){
        df = data %>% 
          select(stacked_time = all_of(.x[1]), 
                 stacked_event = all_of(.x[2])) %>% 
          mutate(stacked_survtype=.y)
      } else if (length(.x) == 3){
        df = data %>% 
          select(stacked_time = all_of(.x[1]), 
                 stacked_time2 = all_of(.x[2]), 
                 stacked_event =all_of(.x[3])) %>% 
          mutate(stacked_survtype=.y)
      } else {
        stop("Length 2 or 3 only")
      }
    }) %>% 
    list_rbind()
}

set.seed(42)
df_surv = survival::lung %>% 
  as_tibble() %>% 
  mutate(
    time_os = time,
    event_os = status==2,
    time_pfs = time / (1 + runif(dplyr::n())),
    event_pfs = ifelse(runif(dplyr::n())>0.3, event_os, 1),
    time_efs = time_pfs / (1 + runif(dplyr::n())),
    event_efs = ifelse(runif(dplyr::n())>0.3, event_pfs, 1),
  )

#Right censoring
surv_list = list(
  "Overall survival"=c("time_os", "event_os"),
  "Progression-free survival"=c("time_pfs", "event_pfs"),
  "Event-free survival"=c("time_efs", "event_efs")
)
stack_surv(df_surv, surv_list) %>% 
  survfit2(formula=Surv(stacked_time, stacked_event) ~ stacked_survtype, 
           data = .) %>% 
  ggsurvfit() +
  add_confidence_interval() +
  add_risktable() +
  scale_ggsurvfit()

#Interval censoring (dummy)
surv_list2 = list(
  "Overall survival"=c("time_os", "time_pfs", "event_os"),
  "Progression-free survival"=c("time_os", "time_pfs", "event_pfs")
)
stack_surv(df_surv, surv_list2) %>% 
  survfit2(formula=Surv(stacked_time, stacked_event) ~ stacked_survtype, 
           data = .) %>% 
  ggsurvfit() +
  add_confidence_interval() +
  add_risktable() +
  scale_ggsurvfit()

Created on 2024-05-18 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants