A visual primer to instrumental variables

By Kangho Suh

When assessing the possible efficacy or effectiveness of an intervention, the main objective is to attribute changes you see in the outcome to that intervention alone. That is why clinical trials have strict inclusion and exclusion criteria, and frequently use randomization to create “clean” populations with comparable disease severity and comorbidities. By randomizing, the treatment and control populations should match not only on observable (e.g., demographic) characteristics, but also on unobservable or unknown confounders. As such, the difference in results between the groups can be interpreted as the effect of the intervention alone and not some other factors. This avoids the problem of selection bias, which occurs when the exposure is related to observable and unobservable confounders, and which is endemic to observational studies.

In an ideal research setting (ethics aside), we could clone individuals and give one clone the new treatment and the other one a placebo or standard of care and assess the change in health outcomes. Or we could give an individual the new treatment, study the effect the treatment has, go back in time through a DeLorean and repeat the process with the same individual, only this time with a placebo or other control intervention. Obviously, neither of these are practical options. Currently, the best strategy is randomized controlled trials (RCTs), but these have their own limitations (e.g. financial, ethical, and time considerations) that limit the number of interventions that can be studied this way. Also, the exclusion criteria necessary to arrive at these “clean” study populations sometimes mean that they do not represent the real-world patients who will use these new interventions.

For these reasons, observational studies present an attractive option to RCTs by using electronic health records, registries, or administrative claims databases. Observational studies have their own drawbacks, such as selection bias detailed above. We try to address some of these issues by controlling for covariates in statistical models or by using propensity scores to create comparable study groups that have similar distributions of observable covariates (check out the blog entry of using propensity scores by my colleague Lauren Strand). Another method that has been gaining popularity in health services research is an econometric technique called instrumental variables (IV) estimation. In fact, two of my colleagues and the director of our program (Mark Bounthavong, Blythe Adamson, and Anirban Basu, respectively) wrote a primer on the use of IV here.

In their article Mark, Blythe, and Anirban explain the endogeneity issue when the treatment variable is associated with the error term in a regression model. For those of you who might still be confused (I certainly was for a long time!), I’ll use a simple figure1that I found in a textbook to explain how IVs work.

Instrumental Variables

1p. 147 from Kennedy, Peter. A Guide to Econometrics 6th Edition. Oxford: Blackwell Published, 2008. Print

The figure uses circles to represent the variation within variables that we are interested in: each circle represents the treatment variable (X), outcome variable (Y), or the instrumental variable (Z). First, focus on the treatment and outcome circles. We know that some amount of the variability in the outcome is explained by the treatment variable (i.e. treatment effect); this is indicated by the overlap between the two circles (red, blue, and purple). The remaining green section of the outcome variable represents the error (ϵ) obtained with a statistical model. However, if treatment and ϵ are not independent due to, for example, selection bias, some of the green spills over to the treatment circle, creating the red section. Our results are now biased, because a portion (red) of the variation in our outcome is attributed to both treatment and ϵ.

Enter in instrumental variable Z. It must meet two criteria: 1) be strongly correlated with treatment (large overlap of instrument and treatment) and 2) not be correlated with the error term (no overlap with red or green). In the first stage, we regress treatment on the instrument, and obtain the predicted values of treatment (orange and purple). We then regress the outcome on the predicted values of treatment to get the treatment effect (purple). Because we have only used the exogenous part of our treatment X to explain Y, our estimates are unbiased.

Now that you understand the benefit of IV estimators visually, maybe you can explicitly see some of the drawbacks as well. The information used to estimate the treatment effect became much smaller. It went from the overlap between treatment and outcome (red, blue, and purple) to just the purple area. As a result, while the IV estimator may be unbiased, it has more variance than a simple OLS estimator. One way to improve this limitation is to have an instrument that is highly correlated with treatment to make the purple area as large as possible.

A more concerning limitation with IV estimation is the interpretability of results, especially in the context of treatment effect heterogeneity. I will write another blog post about this issue and how it can be addressed if you have a continuous IV, using a method called person-centered treatment (PeT) effects that Anirban created.  Stay tuned!