Skip to content

Commit

Permalink
add anchors and adjust format
Browse files Browse the repository at this point in the history
  • Loading branch information
MatthiasSchmidtblaicherQC committed Nov 1, 2024
1 parent fc692d4 commit ea9580b
Showing 1 changed file with 15 additions and 33 deletions.
48 changes: 15 additions & 33 deletions docs/tutorials/cox_model/cox_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,36 +4,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Plan: \n",
"- Why Cox-PH-model is not a GLM.\n",
"- Cox-PH-model is a Poisson GLM with per-period-effects profiled out.\n",
"- Demonstrating the equivalence and simple Python function.\n",
"- (Maybe) allow option for ties, see \n",
" https://github.com/CamDavidsonPilon/lifelines/blob/b8f017446e81c8f2fec75970e3afbc130f7131ec/lifelines/fitters/coxph_fitter.py#L1620\n",
" and\n",
" https://myweb.uiowa.edu/pbreheny/7210/f15/notes/11-5.pdf\n",
"- Speed comparison.\n",
"- Extensions:\n",
" - Stratification.\n",
" - Less volatile baseline hazard.\n",
"\n",
"TODO:\n",
"- (Why) do we need to split ties? And why does current method not work?\n",
"- Are numeric differences tenable?\n",
"# Cox model in glum\n",
"\n",
"**Intro**\n",
"\n",
"First, we fit a standard Cox Proportional Hazards Model as becnhmark:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cox model in glum\n",
"This tutorial shows how the Cox proportional hazards model (from now on: Cox model) which cannot be represented as an Exponential Dispersion Model (EDM), can still be estimated in glum by a standard Poisson regression after a simple data transformation. The exposition here is mostly based on [1], but the approach has also been described elsewhere [2, 3].\n",
"\n",
"This notebook shows how the Cox proportional hazards model, which cannot be represented as an Exponential Dispersion Model (EDM), can still be estimated in glum by a standard Poisson regression after a simple data transformation. Importantly, the approach described here allows for more varied and flexible approaches than the Cox PH model. Glum's efficient treatment of high-dimensional categoricals will come in handy. The exposition here is mostly based on [1], but the approach has also been described elsewhere [2, 3].\n",
"## Table of Contents\n",
"* [1. Equivalence between the Cox likelihood and a profile Poisson likelihood](#1. Equivalence-between-the-Cox-likelihood-and-a-profile-Poisson-likelihood)\n",
"* [2. Estimating a Cox model in glum](#2.-Estimating-a-Cox-model-in-glum)\n",
"* [3. Benchmarking estimation speed](#3.-Benchmarking-estimation-speed)\n",
"\n",
"## The equivalence between the Cox-likelihood and a profile Poisson likelihood\n",
"## 1. Equivalence between the Cox likelihood and a profile Poisson likelihood<a class=\"anchor\"></a>\n",
"\n",
"In the Cox model, the rate of event occurrence, $\\lambda(t,x_i)$, factorizes nicely into a linear predictor $\\eta_i=\\sum_k \\beta_k x_{ik}$ that depends on individual $i$'s characteristics but not on time $t$ and a baseline hazard $\\lambda_0$ that depends only on time, $\\lambda(t,x_i)=\\lambda_0(t)\\exp(\\eta_i)$. The partial log-likelihood of $\\eta_i$ is\n",
"$$\n",
Expand All @@ -53,7 +35,7 @@
"$$\n",
"which is the same as the partial likelihood in the Cox model, apart from the -1 which drops out in estimation. In short, the Cox partial log likelihood is equivalent to a Poisson log likelihood with the estimate for time period effects fed back in (\"profiled out\"). This means that, to estimate the parameters of the Cox model, one can simply run a Poisson regression with time fixed effects $\\alpha_t$.\n",
"\n",
"## Estimating a Cox model in `glum`\n",
"## 2. Estimating a Cox model in glum<a class=\"anchor\"></a>\n",
"\n",
"We now show that a Poisson model in `glum` yields the same parameter estimates as a Cox model. For the latter, we use the awesome [lifelines](https://github.com/CamDavidsonPilon/lifelines) library. We also take the dataset from lifelines, which is from an RCT on recidivism for 432 convicts released from Maryland state prisons with first arrest after release as event. We first load imports and the dataset:"
]
Expand Down Expand Up @@ -194,7 +176,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -246,7 +228,7 @@
" </tr>\n",
" <tr>\n",
" <th>time fit was run</th>\n",
" <td>2024-11-01 18:05:29 UTC</td>\n",
" <td>2024-11-01 18:20:17 UTC</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
Expand Down Expand Up @@ -499,7 +481,7 @@
" number of observations = 432\n",
"number of events observed = 114\n",
" partial log-likelihood = -656.25\n",
" time fit was run = 2024-11-01 18:05:29 UTC\n",
" time fit was run = 2024-11-01 18:20:17 UTC\n",
"\n",
"---\n",
" coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%\n",
Expand Down Expand Up @@ -562,7 +544,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -603,7 +585,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -796,7 +778,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Estimation speed\n",
"## 3. Benchmarking estimation speed<a class=\"anchor\"></a>\n",
"\n",
"Given that the Poisson model estimates many more parameters than the Cox model, one might wonder if the Poisson approach is competitive in estimation time. The Poisson approach, including the data transformation by `survival_split`, turns out to be faster for the dataset here. This is likely aided by tabmat's optimizations for the high-dimensional `week` categorical."
]
Expand Down

0 comments on commit ea9580b

Please sign in to comment.