Selection bias in instrumental variable analyses
AbstractParticipants in epidemiological and genetic studies are rarely true random samples of the populations they are intended to represent, and both known and unknown factors can influence participation in a study (known as selection into a study). The circumstances in which selection causes bias in an instrumental variable (IV) analysis are not widely understood by practitioners of IV analyses. We use directed acyclic graphs (DAGs) to depict assumptions about the selection mechanism (factors affecting selection) and show how DAGs can be used to determine when a two-stage least squares (2SLS) IV analysis is biased by different selection mechanisms. Via simulations, we show that selection can result in a biased IV estimate with substantial confidence interval undercoverage, and the level of bias can differ between instrument strengths, a linear and nonlinear exposure-instrument association, and a causal and noncausal exposure effect. We present an application from the UK Biobank study, which is known to be a selected sample of the general population. Of interest was the causal effect of education on the decision to smoke. The 2SLS exposure estimates were very different between the IV analysis ignoring selection and the IV analysis which adjusted for selection (e.g., 1.8 [95% confidence interval −1.5, 5.0] and −4.5 [−6.6, −2.4], respectively). We conclude that selection bias can have a major effect on an IV analysis and that statistical methods for estimating causal effects using data from nonrandom samples are needed.