SDID and DR-learner example formatting changes (#233)

* Changed header formatting on SDID notebook to be consistent with existing notebooks * Changed header formatting on DR-learner notebook to be consistent with existing notebooks * added outline and moved around headers * updated sdid notebook headers and minor rearranging
BasisResearch · Aug 7, 2023 · c979555 · c979555
1 parent 19a395c
commit c979555
Show file tree

Hide file tree

Showing 2 changed files with 94 additions and 18 deletions.
diff --git a/docs/source/dr_learner.ipynb b/docs/source/dr_learner.ipynb
@@ -1,5 +1,33 @@
 {
  "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Doubly robust estimation with Chirho"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Outline\n",
+    "\n",
+    "- [Setup](#setup)\n",
+    "- [Overview: Robust Causal Inference with Cut Modules](#overview:-robust-causal-inference-with-cut-modules)\n",
+    "- [Example: Synthetic data generation from a high-dimensional generalized linear model](#example:-synthetic-data-generation-from-a-high-dimensional-generalized-linear-model)\n",
+    "- [Effect estimation using cut modules](#effect-estimation-using-cut-modules)\n",
+    "- [References](#references)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -27,11 +55,10 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Doubly robust estimation with Chirho\n",
+    "## Overview: Robust causal inference with cut modules\n",
     "\n",
     "In this notebook, we implement a Bayesian analogue of the DR-Learner estimator in Kennedy (2022). The DR-Learner estimator is a doubly robust estimator for the conditional average treatment effect (CATE). It works by regressing a \"psuedo outcome\" on treatment, where the \"psuedo outcome\" is contructed by approximating the outcome and propensity score functions. The DR-Learner estimator is doubly robust in the sense that it is consistent if either the outcome or propensity score models are correctly specified. Moreoever, as long as the outcome and propensity score models can be estimated at $O(N^{-1/4})$ rates, the DR-Learner estimator can estimate CATE at the *parametric* $O(N^{-1/2})$ fast rate.\n",
     "\n",
@@ -137,7 +164,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Synthetic data generation from a high-dimensional generalized linear model\n",
+    "## Example: Synthetic data generation from a high-dimensional generalized linear model\n",
     "\n",
     "We use the classes below to generate synthetic data from a high-dimensional generalized linear model. Further, we will use this class to implement the standard outcome-regression approach to estimate CATE. That is, we regress $Y$ on $X$ and $T$ to obtain an estimate of $E[Y | X, A=1] - E[Y | X, A=0]$. This approach is called the \"plug-in\" approach in Kennedy (2022)."
    ]
@@ -188,7 +215,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Below we generate synthetic data as in Figure 4b of Kennedy (2022)."
+    "Below we generate synthetic data as in Figure 4b of Kennedy (2022)."
    ]
   },
   {
@@ -223,6 +250,13 @@
     "D_test = {\"X\": X_test, \"A\": A_test, \"Y\": Y_test}"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Effect estimation using cut modules"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
@@ -350,7 +384,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Look at how well each method estimates average treatment effect"
+    "Look at how well each method estimates average treatment effect"
    ]
   },
   {
@@ -381,7 +415,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Because we use Bayesian inference, we also get uncertainity estimates for ATE\n"
+    "Because we use Bayesian inference, we also get uncertainity estimates for ATE\n"
    ]
   },
   {
@@ -433,15 +467,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# References\n",
+    "## References\n",
     "\n",
     "Kennedy, Edward. \"Towards optimal doubly robust estimation of heterogeneous causal effects\", 2022. https://arxiv.org/abs/2004.14497.\n",
     "\n",
     "Carmona, Chris U., Geoff K. Nicholls. \"Semi-Modular Inference: enhanced learning in multi-modular models by tempering the influence of components\", 2020. https://arxiv.org/abs/2003.06804.\n"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": []

diff --git a/docs/source/sdid.ipynb b/docs/source/sdid.ipynb
@@ -1,5 +1,37 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "92303b22",
+   "metadata": {},
+   "source": [
+    "# Causal effect estimation in panel data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49e2f7b6",
+   "metadata": {},
+   "source": [
+    "## Outline\n",
+    "\n",
+    "- [Setup](#setup)\n",
+    "- [Overview: Robust Causal Inference with Panel Data](#overview:-robust-causal-inference-with-panel-data)\n",
+    "- [Example: California Smoking Cessation](#example:-california-smoking-cessation)\n",
+    "- [Causal Query: Counterfactual Prediction](#causal-query:-counterfactual-prediction)\n",
+    "- [Effect estimation with ordinary Bayesian inference](#effect-estimation-with-ordinary-bayesian-inference)\n",
+    "- [Robust effect estimation with modular Bayesian inference](#robust-effect-estimation-with-modular-bayesian-inference)\n",
+    "- [References](#references)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "add55da8",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -32,10 +64,17 @@
    "id": "39828795",
    "metadata": {},
    "source": [
-    "# Causal effect estimation in panel data\n",
-    "\n",
-    "In this notebook, we implement the synthetic difference-in-differences (SDID) estimator proposed in [1]. The SDID estimator combines the strengths of difference-in-differences and synthetic control methods through a two-stage weighted regression. \n",
+    "## Overview: Robust Causal Inference with Panel Data\n",
     "\n",
+    "In this notebook, we implement the synthetic difference-in-differences (SDID) estimator proposed in [1]. The SDID estimator combines the strengths of difference-in-differences and synthetic control methods through a two-stage weighted regression. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e9171f3",
+   "metadata": {},
+   "source": [
+    "## Example: California Smoking Cessation\n",
     "As in [1], we analyze the California Smoking Cessation dataset [2] to estimate the effect cigarette taxes had in California. Specifically, in 1989, California passed Proposition 99 which increased cigarette taxes. We will estimate the impact this policy had on cigarette consumption using the California smoking cessation program dataset. This dataset consists of cigarette consumption of 39 states between 1970 to 2000, 38 of which are control units.\n",
     "\n",
     "We start by loading and visualizing the dataset."
@@ -245,14 +284,16 @@
    "id": "66cb6c41",
    "metadata": {},
    "source": [
-    "### We would like to estimate a counterfactual: had California not raised cigarette taxes, what would have cigarette consumption been?\n",
+    "## Causal Query: Counterfactual Prediction\n",
+    "\n",
+    "In this setting we would like to estimate a counterfactual: had California not raised cigarette taxes, what would have cigarette consumption been?\n",
     "\n",
     "To estimate this effect, we implement a Bayesian analogue of the Synthetic Difference-in-Differences (SDID) estimator proposed in [1]. "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "id": "d7566e08",
    "metadata": {},
    "outputs": [],
@@ -355,7 +396,7 @@
    "id": "a10e36ed",
    "metadata": {},
    "source": [
-    "### Let's visualize our Bayesian SDID probabilistic model."
+    "Let's visualize our Bayesian SDID probabilistic model."
    ]
   },
   {
@@ -555,7 +596,9 @@
    "id": "9741babe",
    "metadata": {},
    "source": [
-    "### First, we estimate $\\tau$ (the effect of Proposition 99) by performing joint Bayesian inference over all latents parameters in the model. We report the marginal approximate posterior over $\\tau$."
+    "## Effect estimation with ordinary Bayesian inference\n",
+    "\n",
+    "First, we estimate $\\tau$ (the effect of Proposition 99) by performing joint Bayesian inference over all latents parameters in the model. We report the marginal approximate posterior over $\\tau$."
    ]
   },
   {
@@ -717,7 +760,7 @@
    "id": "5d72fef0",
    "metadata": {},
    "source": [
-    "### Robustification with Modular Bayesian Inference\n",
+    "## Robust effect estimation with modular Bayesian inference\n",
     "\n",
     "From the figure above, we see that the estimated synthetic control has non-trivial deviations from California during the pre-treatment period. To robustify our causal effect estimates, we use modular Bayesian inference and compute the \"cut posterior\" for $\\tau$ [3]. Specifically, we define \"module one\" as all observed and latent variables associated with the time and synthetic control weights. We define \"module two\" as the latent variables used to compute the response likelihood. \n",
     "\n",
@@ -771,7 +814,7 @@
    "id": "1a20e980",
    "metadata": {},
    "source": [
-    "### Below we see that the synthetic control unit estimated from the cut posterior is a better fit to the treated unit (California) during the pre-treatment period "
+    "Below we see that the synthetic control unit estimated from the cut posterior is a better fit to the treated unit (California) during the pre-treatment period "
    ]
   },
   {
@@ -841,7 +884,7 @@
    "id": "b681f030",
    "metadata": {},
    "source": [
-    "# References\n",
+    "## References\n",
     "1. https://www.aeaweb.org/articles?id=10.1257/aer.20190159\n",
     "2. https://www.tandfonline.com/doi/abs/10.1198/jasa.2009.ap08746\n",
     "3. https://projecteuclid.org/journals/bayesian-analysis/volume-4/issue-1/Modularization-in-Bayesian-analysis-with-emphasis-on-analysis-of-computer/10.1214/09-BA404.full\n",