-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tutorial on how to estimate a Cox Proportional Hazards Model in glum #876
base: main
Are you sure you want to change the base?
Conversation
"\n", | ||
"## 1. Equivalence Between the Cox Likelihood and a Profile Poisson Likelihood<a class=\"anchor\"></a>\n", | ||
"\n", | ||
"In the Cox model, the rate of event occurrence, $\\lambda(t,x_i)$, factorizes nicely into a linear predictor $\\eta_i=\\sum_k \\beta_k x_{ik}$ that depends on individual $i$'s characteristics but not on time $t$ and a baseline hazard $\\lambda_0$ that depends only on time, $\\lambda(t,x_i)=\\lambda_0(t)\\exp(\\eta_i)$ (the proportional hazards assumption). The partial log-likelihood of $\\eta_i$ is\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"In the Cox model, the rate of event occurrence, $\\lambda(t,x_i)$, factorizes nicely into a linear predictor $\\eta_i=\\sum_k \\beta_k x_{ik}$ that depends on individual $i$'s characteristics but not on time $t$ and a baseline hazard $\\lambda_0$ that depends only on time, $\\lambda(t,x_i)=\\lambda_0(t)\\exp(\\eta_i)$ (the proportional hazards assumption). The partial log-likelihood of $\\eta_i$ is\n", | |
"In the Cox model, the rate of event occurrence, $\\lambda(t,x_i)$, factorizes nicely into a linear predictor $\\eta_i=\\sum_k \\beta_k x_{ik}$ that depends on individual $i$'s characteristics but not on time $t$, and a baseline hazard $\\lambda_0$ that depends only on time: $\\lambda(t,x_i)=\\lambda_0(t)\\exp(\\eta_i)$. This is known as the proportional hazards assumption). The partial log-likelihood of $\\eta_i$ is\n", |
"\n", | ||
"## 2. Estimating a Cox Model in Glum<a class=\"anchor\"></a>\n", | ||
"\n", | ||
"We now show that a Poisson approach in `glum` yields the same parameter estimates as a Cox model. For the latter, we use the [lifelines](https://github.com/CamDavidsonPilon/lifelines) library. We also take the dataset from lifelines, which is from an RCT on recidivism for 432 convicts released from Maryland state prisons with first arrest after release as event. We first load imports and the dataset. The dataset has one row per convict, with two outcome columns, the `week` until which the observation lasts and `arrest`, which indicates whether an arrest event happened or not (censoring)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We now show that a Poisson approach in `glum` yields the same parameter estimates as a Cox model. For the latter, we use the [lifelines](https://github.com/CamDavidsonPilon/lifelines) library. We also take the dataset from lifelines, which is from an RCT on recidivism for 432 convicts released from Maryland state prisons with first arrest after release as event. We first load imports and the dataset. The dataset has one row per convict, with two outcome columns, the `week` until which the observation lasts and `arrest`, which indicates whether an arrest event happened or not (censoring)." | |
"We now show that a Poisson approach in `glum` yields the same parameter estimates as a Cox model. For the latter, we use the [lifelines](https://github.com/CamDavidsonPilon/lifelines) library. We also take the dataset from lifelines, which is from an RCT on recidivism for 432 convicts released from Maryland state prisons with first arrest after release as event. We first load imports and the dataset. The dataset has one row per convict, with two outcome columns: the `week` until which the observation lasts and `arrest`, which indicates whether an arrest event happened or not (censoring)." |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"One might, therefore, wonder if the Poisson approach is competitive in terms of estimation speed. For the dataset here, the Poisson approach, including the data transformation by `survival_split`, turns out to be faster than the Cox model. This speedup is likely aided by tabmat's optimizations for the high-dimensional `week` categorical." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"One might, therefore, wonder if the Poisson approach is competitive in terms of estimation speed. For the dataset here, the Poisson approach, including the data transformation by `survival_split`, turns out to be faster than the Cox model. This speedup is likely aided by tabmat's optimizations for the high-dimensional `week` categorical." | |
"One might, therefore, wonder if the Poisson approach is competitive in terms of estimation speed. For the dataset here, the Poisson approach, including the data transformation by `survival_split`, turns out to be faster than the Cox model. This speedup is aided by tabmat's optimizations for the high-dimensional `week` categorical." |
I'm confident this is the case :)
"source": [ | ||
"## Footnotes\n", | ||
"<span id=\"fn1\"> 1</span>:\n", | ||
"The Cox model assumes that at most one individual has an event at any time (\"no ties\").\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention here that there are various ways to deal with this? (e.g., exact - and expensive - calculations, efron approximation (reasonably accurate), breslow approximation (fastest). Also, in the case of many ties, an inherently discrete survival model might be the best option.
"\n", | ||
"[2] Whitehead, J., 1980. Fitting Cox’s regression model to survival data using GLIM. _Journal of the Royal Statistical Society Series C: Applied Statistics_, 29(3), pp.268-275.\n", | ||
"\n", | ||
"[3] Zhong, C. and Tibshirani, R., 2019. Survival analysis as a classification problem. arXiv preprint arXiv:1909.11171." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do use the same dataset-transformation (which they call stacking) idea, but they rely on a binomial model instead of Poisson and say that it is a good enough approximation. So relevant, but I would not say it's the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Matthias, I like this tutorial a lot!
Maybe we could also mention that the model we estimate is conceptually a piecewise constant hazard model, but because we have a different constant for each event time, it becomes essentially non-parametric. As a corollary, if one wants to reduce the number of parameters, they can opt for merging some categories and estimating fewer "constants".
And: Add link to Matt Mills' tutorial on fitting penalized splines in glum.
Checklist
CHANGELOG.rst
entry