Statistical Power for the Comparative Regression Discontinuity Design With a Pretest No-Treatment Control Function: Theory and Evidence From the National Head Start Impact Study

2018 ◽  
Vol 42 (1) ◽  
pp. 71-110 ◽  
Author(s):  
Yang Tang ◽  
Thomas D. Cook

The basic regression discontinuity design (RDD) has less statistical power than a randomized control trial (RCT) with the same sample size. Adding a no-treatment comparison function to the basic RDD creates a comparative RDD (CRD); and when this function comes from the pretest value of the study outcome, a CRD-Pre design results. We use a within-study comparison (WSC) to examine the power of CRD-Pre relative to both basic RDD and RCT. We first build the theoretical foundation for power in CRD-Pre, then derive the relevant variance formulae, and finally compare them to the theoretical RCT variance. We conclude from this theoretical part of this article that (1) CRD-Pre’s power gain depends on the partial correlation between the pretest and posttest measures after conditioning on the assignment variable, (2) CRD-Pre is less responsive than basic RDD to how the assignment variable is distributed and where the cutoff is located, and (3) under a variety of conditions, the efficiency of CRD-Pre is very close to that of the RCT. Data from the National Head Start Impact Study are then used to construct RCT, RDD, and CRD-Pre designs and to compare their power. The empirical results indicate (1) a high level of correspondence between the predicted and obtained power results for RDD and CRD-Pre relative to the RCT, and (2) power levels in CRD-Pre and RCT that are very close. The study is unique among WSCs for its focus on the correspondence between RCT and observational study standard errors rather than means.

2018 ◽  
Vol 42 (1) ◽  
pp. 111-143 ◽  
Author(s):  
Yasemin Kisbu-Sakarya ◽  
Thomas D. Cook ◽  
Yang Tang ◽  
M. H. Clark

Compared to the randomized experiment (RE), the regression discontinuity design (RDD) has three main limitations: (1) In expectation, its results are unbiased only at the treatment cutoff and not for the entire study population; (2) it is less efficient than the RE and so requires more cases for the same statistical power; and (3) it requires correctly specifying the functional form that relates the assignment and outcome variables. One way to overcome these limitations might be to add a no-treatment functional form to the basic RDD and including it in the outcome analysis as a comparison function rather than as a covariate to increase power. Doing this creates a comparative regression discontinuity design (CRD). It has three untreated regression lines. Two are in the untreated segment of the RDD—the usual RDD one and the added untreated comparison function—while the third is in the treated RDD segment. Also observed is the treated regression line in the treated segment. Recent studies comparing RE, RDD, and CRD causal estimates have found that CRD reduces imprecision compared to RDD and also produces valid causal estimates at the treatment cutoff and also along all the rest of the assignment variable. The present study seeks to replicate these results, but with considerably smaller sample sizes. The power difference between RDD and CRD is replicated, but not the bias results either at the treatment cutoff or away from it. We conclude that CRD without large samples can be dangerous.


2018 ◽  
Vol 42 (2) ◽  
pp. 214-247 ◽  
Author(s):  
Peter M. Steiner ◽  
Vivian C. Wong

In within-study comparison (WSC) designs, treatment effects from a nonexperimental design, such as an observational study or a regression-discontinuity design, are compared to results obtained from a well-designed randomized control trial with the same target population. The goal of the WSC is to assess whether nonexperimental and experimental designs yield the same results in field settings. A common analytic challenge with WSCs, however, is the choice of appropriate criteria for determining whether nonexperimental and experimental results replicate. This article examines different distance-based correspondence measures for assessing correspondence in experimental and nonexperimental estimates. Distance-based measures investigate whether the difference in estimates is small enough to claim equivalence of methods. We use a simulation study to examine the statistical properties of common correspondence measures and recommend a new and straightforward approach that combines traditional significance testing and equivalence testing in the same framework. The article concludes with practical advice on assessing and interpreting results in WSC contexts.


2011 ◽  
Vol 21 (6) ◽  
pp. 636-643 ◽  
Author(s):  
William R. Shadish

This article reviews several decades of the author’s meta-analytic and experimental research on the conditions under which nonrandomized experiments can approximate the results from randomized experiments (REs). Several studies make clear that we can expect accurate effect estimates from the regression discontinuity design, though its statistical power is lower, it estimates a different parameter than the RE, and its analysis is considerably more complex. For other nonrandomized designs, the picture is more complex. They may yield accurate estimates if they are prospectively designed to include comprehensive and reliable measurement of the process by which participants selected into conditions, if they use large sample sizes, and if they carefully select control groups that are from the same location and with the same substantive characteristics. By contrast, we have little good reason to think that nonrandomized experiments using archival data without comprehensive selection measures are likely to yield accurate effect estimates.


2020 ◽  
Vol 110 (11) ◽  
pp. 3634-3660 ◽  
Author(s):  
Abel Brodeur ◽  
Nikolai Cook ◽  
Anthony Heyes

The credibility revolution in economics has promoted causal identification using randomized control trials (RCT), difference-in-differences (DID), instrumental variables (IV) and regression discontinuity design (RDD). Applying multiple approaches to over 21,000 hypothesis tests published in 25 leading economics journals, we find that the extent of p-hacking and publication bias varies greatly by method. IV (and to a lesser extent DID) are particularly problematic. We find no evidence that (i) papers published in the Top 5 journals are different to others; (ii) the journal “revise and resubmit” process mitigates the problem; (iii) things are improving through time. (JEL A14, C12, C52)


Methodology ◽  
2017 ◽  
Vol 13 (1) ◽  
pp. 9-22 ◽  
Author(s):  
Pablo Livacic-Rojas ◽  
Guillermo Vallejo ◽  
Paula Fernández ◽  
Ellián Tuero-Herrero

Abstract. Low precision of the inferences of data analyzed with univariate or multivariate models of the Analysis of Variance (ANOVA) in repeated-measures design is associated to the absence of normality distribution of data, nonspherical covariance structures and free variation of the variance and covariance, the lack of knowledge of the error structure underlying the data, and the wrong choice of covariance structure from different selectors. In this study, levels of statistical power presented the Modified Brown Forsythe (MBF) and two procedures with the Mixed-Model Approaches (the Akaike’s Criterion, the Correctly Identified Model [CIM]) are compared. The data were analyzed using Monte Carlo simulation method with the statistical package SAS 9.2, a split-plot design, and considering six manipulated variables. The results show that the procedures exhibit high statistical power levels for within and interactional effects, and moderate and low levels for the between-groups effects under the different conditions analyzed. For the latter, only the Modified Brown Forsythe shows high level of power mainly for groups with 30 cases and Unstructured (UN) and Autoregressive Heterogeneity (ARH) matrices. For this reason, we recommend using this procedure since it exhibits higher levels of power for all effects and does not require a matrix type that underlies the structure of the data. Future research needs to be done in order to compare the power with corrected selectors using single-level and multilevel designs for fixed and random effects.


Sign in / Sign up

Export Citation Format

Share Document