Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial on how to estimate a Cox Proportional Hazards Model in glum #876

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

MatthiasSchmidtblaicherQC
Copy link
Contributor

@MatthiasSchmidtblaicherQC MatthiasSchmidtblaicherQC commented Nov 1, 2024

And: Add link to Matt Mills' tutorial on fitting penalized splines in glum.

Checklist

  • Added a CHANGELOG.rst entry

@MatthiasSchmidtblaicherQC MatthiasSchmidtblaicherQC linked an issue Nov 1, 2024 that may be closed by this pull request
"\n",
"## 1. Equivalence Between the Cox Likelihood and a Profile Poisson Likelihood<a class=\"anchor\"></a>\n",
"\n",
"In the Cox model, the rate of event occurrence, $\\lambda(t,x_i)$, factorizes nicely into a linear predictor $\\eta_i=\\sum_k \\beta_k x_{ik}$ that depends on individual $i$'s characteristics but not on time $t$ and a baseline hazard $\\lambda_0$ that depends only on time, $\\lambda(t,x_i)=\\lambda_0(t)\\exp(\\eta_i)$ (the proportional hazards assumption). The partial log-likelihood of $\\eta_i$ is\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"In the Cox model, the rate of event occurrence, $\\lambda(t,x_i)$, factorizes nicely into a linear predictor $\\eta_i=\\sum_k \\beta_k x_{ik}$ that depends on individual $i$'s characteristics but not on time $t$ and a baseline hazard $\\lambda_0$ that depends only on time, $\\lambda(t,x_i)=\\lambda_0(t)\\exp(\\eta_i)$ (the proportional hazards assumption). The partial log-likelihood of $\\eta_i$ is\n",
"In the Cox model, the rate of event occurrence, $\\lambda(t,x_i)$, factorizes nicely into a linear predictor $\\eta_i=\\sum_k \\beta_k x_{ik}$ that depends on individual $i$'s characteristics but not on time $t$, and a baseline hazard $\\lambda_0$ that depends only on time: $\\lambda(t,x_i)=\\lambda_0(t)\\exp(\\eta_i)$. This is known as the proportional hazards assumption). The partial log-likelihood of $\\eta_i$ is\n",

"\n",
"## 2. Estimating a Cox Model in Glum<a class=\"anchor\"></a>\n",
"\n",
"We now show that a Poisson approach in `glum` yields the same parameter estimates as a Cox model. For the latter, we use the [lifelines](https://github.com/CamDavidsonPilon/lifelines) library. We also take the dataset from lifelines, which is from an RCT on recidivism for 432 convicts released from Maryland state prisons with first arrest after release as event. We first load imports and the dataset. The dataset has one row per convict, with two outcome columns, the `week` until which the observation lasts and `arrest`, which indicates whether an arrest event happened or not (censoring)."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"We now show that a Poisson approach in `glum` yields the same parameter estimates as a Cox model. For the latter, we use the [lifelines](https://github.com/CamDavidsonPilon/lifelines) library. We also take the dataset from lifelines, which is from an RCT on recidivism for 432 convicts released from Maryland state prisons with first arrest after release as event. We first load imports and the dataset. The dataset has one row per convict, with two outcome columns, the `week` until which the observation lasts and `arrest`, which indicates whether an arrest event happened or not (censoring)."
"We now show that a Poisson approach in `glum` yields the same parameter estimates as a Cox model. For the latter, we use the [lifelines](https://github.com/CamDavidsonPilon/lifelines) library. We also take the dataset from lifelines, which is from an RCT on recidivism for 432 convicts released from Maryland state prisons with first arrest after release as event. We first load imports and the dataset. The dataset has one row per convict, with two outcome columns: the `week` until which the observation lasts and `arrest`, which indicates whether an arrest event happened or not (censoring)."

"cell_type": "markdown",
"metadata": {},
"source": [
"One might, therefore, wonder if the Poisson approach is competitive in terms of estimation speed. For the dataset here, the Poisson approach, including the data transformation by `survival_split`, turns out to be faster than the Cox model. This speedup is likely aided by tabmat's optimizations for the high-dimensional `week` categorical."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"One might, therefore, wonder if the Poisson approach is competitive in terms of estimation speed. For the dataset here, the Poisson approach, including the data transformation by `survival_split`, turns out to be faster than the Cox model. This speedup is likely aided by tabmat's optimizations for the high-dimensional `week` categorical."
"One might, therefore, wonder if the Poisson approach is competitive in terms of estimation speed. For the dataset here, the Poisson approach, including the data transformation by `survival_split`, turns out to be faster than the Cox model. This speedup is aided by tabmat's optimizations for the high-dimensional `week` categorical."

I'm confident this is the case :)

"source": [
"## Footnotes\n",
"<span id=\"fn1\"> 1</span>:\n",
"The Cox model assumes that at most one individual has an event at any time (\"no ties\").\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention here that there are various ways to deal with this? (e.g., exact - and expensive - calculations, efron approximation (reasonably accurate), breslow approximation (fastest). Also, in the case of many ties, an inherently discrete survival model might be the best option.

"\n",
"[2] Whitehead, J., 1980. Fitting Cox’s regression model to survival data using GLIM. _Journal of the Royal Statistical Society Series C: Applied Statistics_, 29(3), pp.268-275.\n",
"\n",
"[3] Zhong, C. and Tibshirani, R., 2019. Survival analysis as a classification problem. arXiv preprint arXiv:1909.11171."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do use the same dataset-transformation (which they call stacking) idea, but they rely on a binomial model instead of Poisson and say that it is a good enough approximation. So relevant, but I would not say it's the same thing.

Copy link
Collaborator

@stanmart stanmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Matthias, I like this tutorial a lot!

Maybe we could also mention that the model we estimate is conceptually a piecewise constant hazard model, but because we have a different constant for each event time, it becomes essentially non-parametric. As a corollary, if one wants to reduce the number of parameters, they can opt for merging some categories and estimating fewer "constants".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add reference to blog post on penalized splines in docs Add Cox Objective
2 participants