knitr::opts_chunk$set(fig.retina = 3,
                      fig.align = "center", out.width = "100%")

library(tidyverse)
library(ggdag)
library(dagitty)

update_geom_defaults(ggdag:::GeomDagText, list(family = "Noto Sans", face = "plain"))

In this project, we want to know the causal effect of anti-NGO laws on foreign aid. If a country passes a new anti-NGO law, or if the general environment for NGOS and civil society worsens in a year, do donor countries reduce their allocated aid in following years? More specifically, what is the effect of treatment (NGO laws or the civil society environment) in time \(t-1\) on foreign aid in time \(t\)?

Doing this is tricky though because (1) it involves time series cross sectional data, (2) treatment history matters, and (3) treatment can either be binary (new law vs. no new law) or continuous (total count of laws, or a continuous measure of civil society openness).

Ultimately, we’re using a MSM like this to estimate the causal effect. The treatment model accounts for a bunch of confounders across three broader categories: (1) human rights and politics (like corruption, level of democracy, etc.), (2) economics and development (like trade, mortality, etc.), and (3) unexpected shocks (like natural disasters). This is based on the following DAG:

simplerish_dag <- dagitty('
dag {
"`Human rights & politics`[t-1]" [adjusted,pos="5,1.5"]
"`Human rights & politics`[t-i]" [pos="1,1.5"]
"`Human rights & politics`[t]" [pos="9,1.5"]
"`Economics & development`[t-1]" [adjusted,pos="5,2.5"]
"`Economics & development`[t-i]" [pos="1,2.5"]
"`Economics & development`[t]" [pos="9,2.5"]
"Outcome[t-1]" [pos="6.5,1.75"]
"Outcome[t-i]" [adjusted,pos="2.5,1.75"]
"Outcome[t]" [outcome,pos="10.5,1.75"]
"Restrictions[t-1]" [exposure,pos="6,2.25"]
"Restrictions[t-i]" [adjusted,pos="2,2.25"]
"Restrictions[t]" [pos="10,2.25"]
"`Unexpected shocks`[t-1]" [adjusted,pos="5,2"]
"`Unexpected shocks`[t-i]" [pos="1,2"]
"`Unexpected shocks`[t]" [pos="9,2"]
"`Human rights & politics`[t-1]" -> "`Human rights & politics`[t]"
"`Human rights & politics`[t-1]" -> "Outcome[t-1]"
"`Human rights & politics`[t-1]" -> "Restrictions[t-1]"
"`Human rights & politics`[t-1]" -> "Restrictions[t]"
"`Human rights & politics`[t-i]" -> "`Human rights & politics`[t-1]"
"`Human rights & politics`[t-i]" -> "Outcome[t-i]"
"`Human rights & politics`[t-i]" -> "Restrictions[t-i]"
"`Human rights & politics`[t]" -> "Outcome[t]"
"`Human rights & politics`[t]" -> "Restrictions[t]"
"`Economics & development`[t-1]" -> "`Economics & development`[t]"
"`Economics & development`[t-1]" -> "Outcome[t-1]"
"`Economics & development`[t-1]" -> "Restrictions[t-1]"
"`Economics & development`[t-1]" -> "Restrictions[t]"
"`Economics & development`[t-i]" -> "`Economics & development`[t-1]"
"`Economics & development`[t-i]" -> "Outcome[t-i]"
"`Economics & development`[t-i]" -> "Restrictions[t-i]"
"`Economics & development`[t]" -> "Outcome[t]"
"`Economics & development`[t]" -> "Restrictions[t]"
"Outcome[t-1]" -> "Outcome[t]"
"Outcome[t-1]" -> "Restrictions[t]"
"Outcome[t-i]" -> "Outcome[t-1]"
"Outcome[t-i]" -> "Restrictions[t-1]"
"Restrictions[t]" -> "Outcome[t]"
"Restrictions[t-1]" -> "Outcome[t-1]"
"Restrictions[t-i]" -> "Outcome[t-i]"
"Restrictions[t-1]" -> "Outcome[t]"
"Restrictions[t-1]" -> "Restrictions[t]"
"Restrictions[t-i]" -> "Outcome[t-1]"
"Restrictions[t-i]" -> "Restrictions[t-1]"
"`Unexpected shocks`[t-1]" -> "Outcome[t-1]"
"`Unexpected shocks`[t-1]" -> "Restrictions[t-1]"
"`Unexpected shocks`[t-1]" -> "Restrictions[t]"
"`Unexpected shocks`[t-1]" -> "`Unexpected shocks`[t]"
"`Unexpected shocks`[t-i]" -> "Outcome[t-i]"
"`Unexpected shocks`[t-i]" -> "Restrictions[t-i]"
"`Unexpected shocks`[t-i]" -> "`Unexpected shocks`[t-1]"
"`Unexpected shocks`[t]" -> "Outcome[t]"
"`Unexpected shocks`[t]" -> "Restrictions[t]"
}
')

simplerish_dag_plot <- simplerish_dag %>% 
  tidy_dagitty() %>% 
  mutate(var_type = case_when(
    str_detect(name, "Outcome") ~ "Outcome",
    str_detect(name, "Restrictions") ~ "Restrictions",
    TRUE ~ "Z"
  )) %>% 
  mutate(time_period = case_when(
    str_detect(name, "t-1") ~ 2,
    str_detect(name, "t-i") ~ 1,
    TRUE ~ 3
  )) %>% 
  mutate(arrow_color = case_when(
    name == "Restrictions[t-1]" & to == "Outcome[t]" ~ "#FF4136",
    TRUE ~ "grey60"
  ))

ggplot(simplerish_dag_plot, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_edges(aes(edge_colour = arrow_color)) +
  geom_dag_point(aes(color = var_type, alpha = time_period), size = 12) +
  geom_dag_text(data = filter(simplerish_dag_plot, var_type == "Outcome"),
                color = "black", size = 3, parse = TRUE) +
  geom_dag_text(data = filter(simplerish_dag_plot, var_type == "Restrictions"),
                color = "black", size = 3, parse = TRUE) +
  geom_dag_text(data = filter(simplerish_dag_plot, var_type == "Z"),
                color = "black", size = 3, parse = TRUE) +
  scale_color_manual(values = c("#B10DC9", "#FF851B", "grey60")) +
  scale_y_reverse() +
  guides(alpha = "none", color = "none") +
  theme_dag()

Or more simply, collapsing all confounders into just \(Z\):

simple_dag <- dagitty('
dag {
"Outcome[t-i]" [pos="1.5,2"]
"Outcome[t-1]" [pos="3.5,2"]
"Outcome[t]" [outcome,pos="6,2"]
"Restrictions[t-i]" [pos="1,1"]
"Restrictions[t-1]" [exposure,pos="3,1"]
"Restrictions[t]" [pos="5,1"]
"Z[t-i]" [pos="1,3"]
"Z[t-1]" [pos="3,3"]
"Z[t]" [pos="5,3"]
"Outcome[t-1]" -> "Outcome[t]"
"Outcome[t-1]" -> "Restrictions[t]"
"Outcome[t-i]" -> "Outcome[t-1]"
"Outcome[t-i]" -> "Restrictions[t-1]"
"Restrictions[t]" -> "Outcome[t]"
"Restrictions[t-1]" -> "Outcome[t-1]"
"Restrictions[t-i]" -> "Outcome[t-i]"
"Restrictions[t-1]" -> "Outcome[t]"
"Restrictions[t-1]" -> "Restrictions[t]"
"Restrictions[t-i]" -> "Outcome[t-1]"
"Restrictions[t-i]" -> "Restrictions[t-1]"
"Z[t-1]" -> "Outcome[t-1]"
"Z[t-1]" -> "Restrictions[t-1]"
"Z[t-1]" -> "Restrictions[t]"
"Z[t-1]" -> "Z[t]"
"Z[t-i]" -> "Outcome[t-i]"
"Z[t-i]" -> "Restrictions[t-1]"
"Z[t-i]" -> "Restrictions[t-i]"
"Z[t-i]" -> "Z[t-1]"
"Z[t]" -> "Outcome[t]"
"Z[t]" -> "Restrictions[t]"
}
') 

simple_dag_plot <- simple_dag %>% 
  tidy_dagitty() %>% 
  mutate(var_type = case_when(
    str_detect(name, "Outcome") ~ "Outcome",
    str_detect(name, "Restrictions") ~ "Restrictions",
    str_detect(name, "Z") ~ "Z"
  )) %>% 
  mutate(time_period = case_when(
    str_detect(name, "t-1") ~ 2,
    str_detect(name, "t-i") ~ 1,
    TRUE ~ 3
  )) %>% 
  mutate(arrow_color = case_when(
    name == "Restrictions[t-1]" & to == "Outcome[t]" ~ "#FF4136",
    TRUE ~ "grey60"
  )) %>% 
  mutate(letter_only = case_when(
    str_detect(name, "Outcome") ~ str_replace(name, "Outcome", "Y"),
    str_detect(name, "Restrictions") ~ str_replace(name, "Restrictions", "X"),
    TRUE ~ name
  ))

simple_dag_out <- ggplot(simple_dag_plot, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_edges(aes(edge_colour = arrow_color)) +
  geom_dag_point(aes(color = var_type, alpha = time_period), size = 12) +
  geom_dag_text(data = filter(simple_dag_plot, var_type == "Outcome"),
                color = "black", size = 4, parse = TRUE, 
                nudge_y = 0.2, nudge_x = 0.3) +
  geom_dag_text(data = filter(simple_dag_plot, var_type == "Restrictions"),
                color = "black", size = 4, parse = TRUE, 
                nudge_y = -0.25) +
  geom_dag_text(data = filter(simple_dag_plot, var_type == "Z"),
                color = "black", size = 4, parse = TRUE,
                nudge_y = 0) +
  scale_color_manual(values = c("#B10DC9", "#FF851B", "grey60")) +
  guides(alpha = "none", color = "none") +
  theme_dag()
simple_dag_out

All our models work well so far (I think!), and we’re doing them Bayesianly to be extra cool (see here for the results).

However, figuring out exactly what causal effect we’re finding (and how to write it with causal math notation) has proven to be really tricky. Following Lundberg, Johnson, and Stewart (2021), we’re trying to use the following process to determine our causal estimand:

  1. Choose a theoretical estimand (\(\tau\)) and connect it to a general theory. It needs to describe a hypothetical intervention, a unit-specific quantity (like \(Y_i\) for a realized outcome or \(Y_i(1) - Y_i(0)\) for a potential outcome), and a target population (like \({\textstyle \frac{1}{n}}\ {\textstyle \sum_{i=1}^n}\))
  2. Choose an empirical estimand (\(\theta\)) that is linked to the theoretical estimand based on a set of identification assumptions from a DAG
  3. Choose an estimation strategy to learn the empirical estimand from the data. This is \(\hat{\theta}\).

Here’s what we’ve tried. We’re mostly stuck with the notation for continuous treatment estimands.

Quick background reference: Potential outcomes, \(\operatorname{do}(\cdot)\), and causal effects

Before getting into the different estimands, here’s some quick translation between definitions of causal effects in the worlds of potential outcomes and DAGs. This is helpful because (I think!) we can think of theoretical and empirical estimands in do-calculus terms like this:

In section 2.4 of Neal (2020) (p. 15), there’s a flowchart of estimands that looks a lot like the estimand framework in Lundberg, Johnson, and Stewart (2021), just with different names:

According to footnote 14 in that chapter, the causal estimand (which seems to map to the theoretical estimand) contains \(\operatorname{do}(\cdot)\), while the statistical estimand (which seems to map to the empirical estimand) does not.

Assuming that’s correct, we can translate do-calculus-style equations into potential outcomes-style equations like these general forms for binary and continuous average causal effects:

General forms for theoretical estimands

Binary treatments:

\[ \begin{aligned} & \textbf{Potential outcomes world:} \\ \tau\ =& {\textstyle \frac{1}{n} \sum_{i=1}^n} Y_i (1) - Y_i (0) \\ & \\ & \text{or alternatively with } \textbf{E} \\ \tau\ =& \textbf{E} [Y_i (1) - Y_i (0)] \\ & \\ & \textbf{Pearl world:} \\ \tau(X)\ =& \textbf{E}[Y_i \mid \operatorname{do}(X = 1) - Y_i \mid \operatorname{do}(X = 0)] \\ & \\ & \text{or more simply} \\ \tau(X)\ =& \textbf{E} [Y_i \mid \operatorname{do}(X)] \\ \end{aligned} \]

Continuous treatments:

\[ \begin{aligned} & \textbf{Potential outcomes world:} \\ \tau\ =& \textbf{E} [Y_i (t^\star) - Y_i (t)] \\ & \\ & \textbf{Pearl world:} \\ \tau(t) =& \textbf{E}[Y_i \mid \operatorname{do}(t^\star) - Y_i \mid \operatorname{do}(t)] \\ & \\ & \text{where } t, t^\star \in \mathcal{T} \text{ and } \mathcal{T} = [\min(t), \max(t)] \end{aligned} \]

(Note, Pearl also uses an explicit \(\operatorname{do}(x + 1)\) for the counterfactual on p. 2 here (Pearl 2019) rather than a more arbitrary \(\operatorname{do}(x^\star)\))

General forms for empirical estimands

Binary treatments:

\[ \begin{aligned} & \textbf{Potential outcomes world:} \\ \theta\ =& \frac{1}{n_{X = 1}} \sum_{i \in \mathcal{S}_{X = 1}} Y_i - \frac{1}{n_{X = 0}} \sum_{i \in \mathcal{S}_{X = 0}} Y_i \\ & \\ & \text{or alternatively with } \textbf{E} \\ \theta\ =& \textbf{E}[Y_i \mid X = 1)] - \textbf{E}[Y_i \mid X = 0] \\ & \\ & \textbf{Pearl world:} \\ \theta(X)\ =& \textbf{E}[Y_i \mid \operatorname{do}(X = 1)] - \textbf{E}[Y_i \mid \operatorname{do}(X = 0)] \\ & \\ & \text{assuming there are backdoor confounders } Z_i \text{ to remove } \operatorname{do}(X) \\ \theta(X)\ =& \sum_Z P(Y_i \mid X_i = 1, Z_i)\ P(Z_i) - \sum_Z P(Y_i \mid X_i = 0, Z_i)\ P(Z_i) \end{aligned} \]

Continuous treatments:

\[ \begin{aligned} & \textbf{Potential outcomes world:} \\ \theta\ =& \textbf{E} [Y_i (t^\star)] - \textbf{E}[Y_i (t)] \\ & \\ & \textbf{Pearl world:} \\ \theta(t) =& \textbf{E}[Y_i \mid \operatorname{do}(t^\star)] - \textbf{E}[Y_i \mid \operatorname{do}(t)] \\ \text{or just}\ & \textbf{E}[Y_i \mid \operatorname{do}(t)] \\ & \\ & \text{assuming there are backdoor confounders } Z_i \text{ to remove } \operatorname{do}(t) \\ \theta(t)\ =& \sum_Z P(Y_i \mid t, Z_i)\ P(Z_i) \\ & \\ & \text{where } t, t^\star \in \mathcal{T} \text{ and } \mathcal{T} = [\min(t), \max(t)] \end{aligned} \]

General forms for estimation strategy

There aren’t really any general forms for this. You figure out \(\hat{\theta}\) however is best. Regression, difference in means, RCTs, inverse probability weighting, whatever.

And actually now that I’ve written all those things out, I’m not 100% sold on the idea that theoretical estimands have \(\operatorname{do}(\cdot)\) and empirical estimands do not. Getting rid of the \(\operatorname{do}(\cdot)\) using do-calculus seems to be more of an estimation or a \(\hat{\theta}\) thing, not a \(\theta\) thing?? idk

1. Set the target: The theoretical estimand (\(\tau\))

First we have to define a theoretical estimand, which requires a substantive argument connected to theory. In this project, we care about the average difference in the potential outcome of foreign aid each aid-eligible country \(i\) would receive if that country increased its legal restrictions on NGOs in the previous year (or the past in general, depending on number of lags?) versus if it did not increase legal restrictions. Equation (1) is the theoretical estimand for a binary treatment (new anti-NGO law passed vs. not):

\[\begin{equation} \begin{aligned} \tau =& \underbrace{{\textstyle \frac{1}{n}}\ {\textstyle \sum_{i=1}^n}}_{\substack{\text{mean over} \\ \text{all countries} \\ \text{eligible for aid}}} \bigg[ \underbrace{Y_{it} (\text{New law}_{t-1})}_{\substack{\text{potential foreign aid} \\ \text{with change in} \\ \text{NGO law}}} - \underbrace{Y_{it} (\text{No new law}_{t-1})}_{\substack{\text{potential foreign aid} \\ \text{with no change in} \\ \text{NGO law}}} \bigg] & [\text{Theoretical estimand, binary treatment}] \end{aligned} \tag{1} \end{equation}\]

We can also look at a continuous treatment… somehow… and say that we care about the foreign aid that each aid-eligible country \(i\) would receive if the count of legal restrictions in the previous year took a particular value. Equation (2) shows this:

\[\begin{equation} \begin{aligned} \tau =& \underbrace{{\textstyle \frac{1}{n}}\ {\textstyle \sum_{i=1}^n}}_{\substack{\text{mean over} \\ \text{all countries} \\ \text{eligible for aid}}} \bigg[ \underbrace{Y_{it} (x^\prime_{t-1})}_{\substack{\text{potential foreign} \\ \text{with alternative} \\ \text{NGO legal regime}}} - \underbrace{Y_{it} (x_{t-1})}_{\substack{\text{potential foreign aid} \\ \text{with overall} \\ \text{NGO legal regime}}} \bigg] & [\text{Theoretical estimand, continuous treatment}] \end{aligned} \tag{2} \end{equation}\]

We thus have these key definitions for our theoretical estimand:

With this TSCS kind of data, though, it’s also a little more complicated than these equations. Marginal potential outcome represents the counterfactual level of foreign aid in country \(i\) if NGO laws run their natural course from the beginning of the data up to \(t - 2\) and then the last lag of NGO laws remains the same, with no changes (see p. 3 in Blackwell and Glynn (2018)). In other words, it would be the expected effect of a random country increasing its anti-NGO laws in period \(t-1\) on foreign aid in period \(t\).

In the biostats world, they write these potential outcome specifications with g-estimation-based models, like Equation (3):

\[\begin{equation} \begin{aligned} \tau =&\ g(x_{t-1}; \beta) - g(x^\prime_{t-1}; \beta) & [\text{g-estimation-based theoretical estimand}]\\ \end{aligned} \tag{3} \end{equation}\]

Because it is not possible to observe both potential outcomes for each country, the theoretical estimand \(\tau\) is unmeasurable. But that’s fine! Theoretical estimands don’t need to be measurable. That’s what the empirical estimand is for.

3. Learn from the data (\(\hat{\theta}\))

We find \(\hat{\theta}\) by defining an estimation strategy and providing statistical evidence. We estimate \(\hat{\theta}\) using a marginal structural model (MSM) and inverse probability of treatment weights (IPTW). There’s a whole literature in epidemiology and biostats about how these work and how they provide unbiased estimates of causal parameters through g-estimation. In summary (see equation 32 in Blackwell and Glynn (2018)), using stabilized inverse probability weights in a MSM makes it so that the expectation of \(Y_{it}\) conditional on \(X_{i, t-1}\) converges to the true theoretical estimand:

\[\begin{equation} \begin{aligned} \hat{\theta} =&\ \underbrace{\textbf{E}_\text{SW}[Y_{it} \mid X_{i, t-1} = x_{t-1}]}_{\substack{\text{observed average aid} \\ \text{given reweighted treatment}}} \xrightarrow{p} \underbrace{\textbf{E}[Y_{it} (x_{t-1})]}_{\substack{\text{this is kinda} \\ \text{like } \tau \text{ or } \theta?}} & [\text{Estimate of estimand, binary and continuous}] \end{aligned} \tag{8} \end{equation}\]

The biostats g-estimation way of writing this MSM looks like this:

\[\begin{equation} \begin{aligned} \hat{\theta} =&\ g(x_{t-1}; \beta) = \beta_0 + \underbrace{\beta_1}_{\substack{\text{average} \\ \text{causal} \\ \text{effect}}} x_{t-1} & [\text{Estimate of estimand, marginal structural model}] \end{aligned} \tag{9} \end{equation}\]

We can thus obtain an unbiased estimate of the causal parameter \(\beta_1\) of the MSM by fitting the model, by giving each country a time-specific weight, based on the confounders.

All estimands

Thus, here’s our Lundberg, Johnson, and Stewart (2021) process:

  1. Theoretical, unobservable estimand (\(\tau\)):

    \[\begin{equation} \begin{aligned} \tau =& \underbrace{{\textstyle \frac{1}{n}}\ {\textstyle \sum_{i=1}^n}}_{\substack{\text{mean over} \\ \text{all countries} \\ \text{eligible for aid}}} \bigg[ \underbrace{Y_{it} (x^\prime_{t-1})}_{\substack{\text{potential foreign} \\ \text{with alternative} \\ \text{NGO legal regime}}} - \underbrace{Y_{it} (x_{t-1})}_{\substack{\text{potential foreign aid} \\ \text{with overall} \\ \text{NGO legal regime}}} \bigg] & [\text{Theoretical estimand, continuous treatment}] \end{aligned} \tag{10} \end{equation}\]

  2. Empirical estimand (\(\theta\))—the one hypothetical extra law in \(X + 1\) is basically the average marginal effect:

    \[\begin{equation} \begin{aligned} \theta =&\ \underbrace{\textbf{E}[Y_{it} \mid X_{i, t-1} + 1]}_{\substack{\text{Observed mean aid given} \\ \text{total NGO laws in } t-1 \\ {\textbf{plus one hypothetical extra law}}}} - \underbrace{\textbf{E}[Y_{it} \mid X_{i, t-1}]}_{\substack{\text{Observed mean aid given} \\ \text{total NGO laws in } t-1}} & [\text{Empirical estimand, continuous treatment}] \end{aligned} \tag{11} \end{equation}\]

  3. Estimate of estimand (\(\hat{\theta}\)), or marginal structural model:

    \[\begin{equation} \begin{aligned} \hat{\theta} =&\ g(x_{t-1}; \beta) = \beta_0 + \underbrace{\beta_1}_{\substack{\text{average} \\ \text{causal} \\ \text{effect}}} x_{t-1} & [\text{Estimate of estimand, marginal structural model}] \end{aligned} \tag{12} \end{equation}\]

Hopefully this is somewhat remotely correct. Who even knows. Merging causal notation from the do-calculus world, potential outcomes world, and epidemiology world is really really hard.

References

Blackwell, Matthew, and Adam N. Glynn. 2018. “How to Make Causal Inferences with Time-Series Cross-Sectional Data Under Selection on Observables.” American Political Science Review 112 (4): 1067–82. https://doi.org/10.1017/s0003055418000357.
Hernán, Miguel A., Babette A. Brumback, and James M. Robins. 2002. “Estimating the Causal Effect of Zidovudine on Cd4 Count with a Marginal Structural Model for Repeated Measures.” Statistics in Medicine 21 (12): 1689–709. https://doi.org/10.1002/sim.1144.
Lundberg, Ian, Rebecca Johnson, and Brandon M. Stewart. 2021. “What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.” American Sociological Review. https://doi.org/10.1177/00031224211004187.
Neal, Brady. 2020. Introduction to Causal Inference from a Machine Learning Perspective. https://www.bradyneal.com/causal-inference-course.
Pearl, Judea. 2019. “On the Interpretation of Do(x).” Journal of Causal Inference 7 (1): 1–6. https://doi.org/10.1515/jci-2019-2002.