Abstract
The increasing availability of passively observed data has yielded a growing interest in “data fusion” methods, which involve merging data from observational and experimental sources to draw causal conclusions. Such methods often require a precarious tradeoff between the unknown bias in the observational dataset and the often-large variance in the experimental dataset. We propose an alternative approach, which avoids this tradeoff: rather than using observational data for inference, we use it to design a more efficient experiment. We consider the case of a stratified experiment with a binary outcome and suppose pilot estimates for the stratum potential outcome variances can be obtained from the observational study. We extend existing results to generate confidence sets for these variances, while accounting for the possibility of unmeasured confounding. Then, we pose the experimental design problem as a regret minimization problem subject to the constraints imposed by our confidence sets. We show that this problem can be converted into a concave maximization and solved using conventional methods. Finally, we demonstrate the practical utility of our methods using data from the Women’s Health Initiative.