The puzzling relationship between multi-lab replications and meta-analyses of the rest of the literature

What is the best way to estimate the size of important effects? Should we aggregate across disparate findings using statistical meta-analysis, or instead run large, multi-lab replications (MLR)? A recent paper by Kvarven, Strømland, and Johannesson (2020) compared effect size estimates derived from these two different methods for 15 different psychological phenomena. The authors report that, for the same phenomenon, the meta-analytic estimate tends to be about three times larger than the MLR estimate. These results pose an important puzzle: What is the relationship between these two estimates? Kvarven et al. suggest that their results undermine the value of meta-analysis. In contrast, we argue that both meta-analysis and MLR are informative, and that the discrepancy between estimates obtained via the two methods is in fact still unexplained. Informed by re-analyses of Kvarven et al.’s data and by other empirical evidence, we discuss possible sources of this discrepancy and argue that understanding the relationship between estimates obtained from these two methods is an important puzzle for future meta-scientific research.

Download Full-text

A Statistical Method for Synthesizing Meta-Analyses

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/732989 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Liansheng Larry Tang ◽

Michael Caudy ◽

Faye Taxman

Keyword(s):

Statistical Method ◽

Effect Size ◽

Meta Analysis ◽

Effect Sizes ◽

Weighted Mean ◽

Summary Effect ◽

Different Types ◽

Meta Analyses ◽

Size Estimates ◽

Similar Search

Multiple meta-analyses may use similar search criteria and focus on the same topic of interest, but they may yield different or sometimes discordant results. The lack of statistical methods for synthesizing these findings makes it challenging to properly interpret the results from multiple meta-analyses, especially when their results are conflicting. In this paper, we first introduce a method to synthesize the meta-analytic results when multiple meta-analyses use the same type of summary effect estimates. When meta-analyses use different types of effect sizes, the meta-analysis results cannot be directly combined. We propose a two-step frequentist procedure to first convert the effect size estimates to the same metric and then summarize them with a weighted mean estimate. Our proposed method offers several advantages over existing methods by Hemming et al. (2012). First, different types of summary effect sizes are considered. Second, our method provides the same overall effect size as conducting a meta-analysis on all individual studies from multiple meta-analyses. We illustrate the application of the proposed methods in two examples and discuss their implications for the field of meta-analysis.

Download Full-text

Systematic review and meta-analysis of the relationship between the heartbeat-evoked potential and interoception.

10.31234/osf.io/g2zhc ◽

2020 ◽

Author(s):

Michel-Pierre Coll ◽

Hannah Hobson ◽

Jennifer Murphy

Keyword(s):

Systematic Review ◽

Empirical Evidence ◽

Evoked Potential ◽

Meta Analysis ◽

Clinical Status ◽

The Body ◽

Indirect Measures ◽

Heartbeat Evoked Potential ◽

Meta Analyses ◽

The Relationship

The Heartbeat Evoked Potential (HEP) has been proposed as a neurophysiological marker of interoceptive processing. Despite its use to validate interoceptive measures and to assess interoceptive functioning in clinical groups, the empirical evidence for a relationship between HEP amplitude and interoceptive processing, including measures of such processing, is scattered across several studies with varied designs. The aim of this systematic review and meta-analysis was to examine the body of HEP-interoception research, and consider the associations the HEP shows with various direct and indirect measures of interoception, and how it is affected by manipulations of interoceptive processing. Specifically, we assessed the effect on HEP amplitude of manipulating attention to the heartbeat; manipulating participants’ arousal; the association between the HEP and behavioural measures of cardiac interoception; and comparisons between healthy and clinical groups. Following database searches and screening, 45 studies were included in the systematic review and 42 in the meta-analyses. We noted variations in the ways individual studies have attempted to address key confounds, particularly the cardiac field artefact. Meta-analytic summaries indicated there were moderate to large effects of attention, arousal, and clinical status on the HEP, and a moderate association between HEP amplitude and behavioural measures of interoception. Problematically, the reliability of the meta-analytic effects documented here remain unknown, given the lack of standardised protocols for measuring the HEP. Thus, it is possible effects are driven by confounds such as cardiac factors or somatosensory effects.

Download Full-text

Preprint - Meta-Analyzing the Multiverse: A Peek Under the Hood of Selective Reporting

10.31234/osf.io/43yae ◽

2021 ◽

Author(s):

Anton Olsson-Collentine ◽

Robbie Cornelis Maria van Aert ◽

Marjan Bakker ◽

Jelte M. Wicherts

Keyword(s):

Effect Size ◽

Degrees Of Freedom ◽

Meta Analysis ◽

Extreme Case ◽

Large Degree ◽

Selective Reporting ◽

Primary Research ◽

Standard Deviations ◽

Meta Analyses ◽

Size Estimates

There are arbitrary decisions to be made (i.e., researcher degrees of freedom) in the execution and reporting of most research. These decisions allow for many possible outcomes from a single study. Selective reporting of results from this ‘multiverse’ of outcomes, whether intentional (_p_-hacking) or not, can lead to inflated effect size estimates and false positive results in the literature. In this study, we examine and illustrate the consequences of researcher degrees of freedom in primary research, both for primary outcomes and for subsequent meta-analyses. We used a set of 10 preregistered multi-lab direct replication projects from psychology (Registered Replication Reports) with a total of 14 primary outcome variables, 236 labs and 37,602 participants. By exploiting researcher degrees of freedom in each project, we were able to compute between 3,840 and 2,621,440 outcomes per lab. We show that researcher degrees of freedom in primary research can cause substantial variability in effect size that we denote the Underlying Multiverse Variability (UMV). In our data, the median UMV across labs was 0.1 standard deviations (interquartile range = 0.09 – 0.15). In one extreme case, the effect size estimate could change by _d_ = 1.27, evidence that _p_-hacking in some (rare) cases can provide support for almost any conclusion. We also show that researcher degrees of freedom in primary research provide another source of uncertainty in meta-analysis beyond those usually estimated. This would not be a large concern for meta-analysis if researchers made all arbitrary decisions at random. However, emulating selective reporting of lab results led to inflation of meta-analytic average effect size estimates in our data by as much as 0.1 - 0.48 standard deviations, depending to a large degree on the number of possible outcomes at the lab level (i.e., multiverse size). Our results illustrate the importance of making research decisions transparent (e.g., through preregistration and multiverse analysis), evaluating studies for selective reporting, and whenever feasible making raw data available.

Download Full-text

The Anxiolytic Effects of Exercise: A Meta-Analysis of Randomized Trials and Dose–Response Analysis

Journal of Sport and Exercise Psychology ◽

10.1123/jsep.30.4.392 ◽

2008 ◽

Vol 30 (4) ◽

pp. 392-410 ◽

Cited By ~ 231

Author(s):

Bradley M. Wipfli ◽

Chad D. Rethorst ◽

Daniel M. Landers

Keyword(s):

Randomized Controlled Trials ◽

Effect Size ◽

Meta Analysis ◽

Response Analysis ◽

Controlled Trials ◽

Randomized Controlled ◽

Exercise Dose ◽

Level 1 ◽

Meta Analyses ◽

The Relationship

A meta-analysis was conducted to examine the effects of exercise on anxiety. Because previous meta-analyses in the area included studies of varying quality, only randomized, controlled trials were included in the present analysis. Results from 49 studies show an overall effect size of -0.48, indicating larger reductions in anxiety among exercise groups than no-treatment control groups. Exercise groups also showed greater reductions in anxiety compared with groups that received other forms of anxiety-reducing treatment (effect size = -0.19). Because only randomized, controlled trials were examined, these results provide Level 1, Grade A evidence for using exercise in the treatment of anxiety. In addition, exercise dose data were calculated to examine the relationship between dose of exercise and the corresponding magnitude of effect size.

Download Full-text

p-Hacking and Publication Bias Interact to Distort Meta-Analytic Effect Size Estimates

10.31234/osf.io/bvm58 ◽

2020 ◽

Author(s):

Malte Friese ◽

Julius Frankenbach

Keyword(s):

Publication Bias ◽

Effect Size ◽

Large Scale ◽

Meta Analysis ◽

Scientific Progress ◽

Reliable Technique ◽

Research Areas ◽

Broad Array ◽

Meta Analyses ◽

Size Estimates

Science depends on trustworthy evidence. Thus, a biased scientific record is of questionable value because it impedes scientific progress, and the public receives advice on the basis of unreliable evidence that has the potential to have far-reaching detrimental consequences. Meta-analysis is a valid and reliable technique that can be used to summarize research evidence. However, meta-analytic effect size estimates may themselves be biased, threatening the validity and usefulness of meta-analyses to promote scientific progress. Here, we offer a large-scale simulation study to elucidate how p-hacking and publication bias distort meta-analytic effect size estimates under a broad array of circumstances that reflect the reality that exists across a variety of research areas. The results revealed that, first, very high levels of publication bias can severely distort the cumulative evidence. Second, p-hacking and publication bias interact: At relatively high and low levels of publication bias, p-hacking does comparatively little harm, but at medium levels of publication bias, p-hacking can considerably contribute to bias, especially when the true effects are very small or are approaching zero. Third, p-hacking can severely increase the rate of false positives. A key implication is that, in addition to preventing p-hacking, policies in research institutions, funding agencies, and scientific journals need to make the prevention of publication bias a top priority to ensure a trustworthy base of evidence.

Download Full-text

Association of study characteristics with estimates of effect size in studies of ecstasy use

Journal of Psychopharmacology ◽

10.1177/0269881111408955 ◽

2011 ◽

Vol 25 (11) ◽

pp. 1573-1577 ◽

Cited By ~ 4

Author(s):

Eleanor M Taylor ◽

Natasha MP Greene ◽

Celia JA Morgan ◽

Marcus R Munafò

Keyword(s):

Effect Size ◽

Meta Analysis ◽

Control Group ◽

Effect Size Estimate ◽

Size Estimate ◽

Study Characteristics ◽

Drug Naïve ◽

Meta Analyses ◽

And Control ◽

Size Estimates

Studies of the chronic effects of MDMA, or ‘ecstasy’, in humans have been largely inconsistent. We explored whether study-level characteristics are associated with the effect size estimate reported. We based our analyses on the recent systematic review by Rogers and colleagues, focusing on those meta-analyses within this report where there was a relatively large number of studies contributing to each individual meta-analysis. Linear regression was used to investigate the association between study level variables and effect size estimate, weighted by the inverse of the SE of the effect size estimate, with cluster correction for studies which contributed multiple estimates. This indicated an association between effect size estimate and both user group, with smaller estimates among studies recruiting former users compared with those recruiting current users, and control group, with smaller estimates among studies recruiting polydrug user controls compared with those recruiting drug-naïve controls. In addition, increasing year of publication was associated with reduced effect size estimate, and there was a trend level association with prevalence of ecstasy use, reflecting smaller estimates among studies conducted in countries with higher prevalence of ecstasy use. Our data suggest a number of study-level characteristics which appear to influence individual study effect size estimates. These should be considered when designing future studies, and also when interpreting the ecstasy literature as a whole.

Download Full-text

A Meta-Analysis of Interviews and Cognitive Ability

Journal of Personnel Psychology ◽

10.1027/1866-5888/a000091 ◽

2013 ◽

Vol 12 (4) ◽

pp. 157-169 ◽

Cited By ~ 7

Author(s):

Philip L. Roth ◽

Allen I. Huffcutt

Keyword(s):

Cognitive Ability ◽

Meta Analysis ◽

Incremental Validity ◽

Employment Interviews ◽

Moderator Analysis ◽

The Future ◽

Research Questions ◽

Meta Analyses ◽

The Relationship

The topic of what interviews measure has received a great deal of attention over the years. One line of research has investigated the relationship between interviews and the construct of cognitive ability. A previous meta-analysis reported an overall corrected correlation of .40 ( Huffcutt, Roth, & McDaniel, 1996 ). A more recent meta-analysis reported a noticeably lower corrected correlation of .27 ( Berry, Sackett, & Landers, 2007 ). After reviewing both meta-analyses, it appears that the two studies posed different research questions. Further, there were a number of coding judgments in Berry et al. that merit review, and there was no moderator analysis for educational versus employment interviews. As a result, we reanalyzed the work by Berry et al. and found a corrected correlation of .42 for employment interviews (.15 higher than Berry et al., a 56% increase). Further, educational interviews were associated with a corrected correlation of .21, supporting their influence as a moderator. We suggest a better estimate of the correlation between employment interviews and cognitive ability is .42, and this takes us “back to the future” in that the better overall estimate of the employment interviews – cognitive ability relationship is roughly .40. This difference has implications for what is being measured by interviews and their incremental validity.

Download Full-text

The Relationship Between Immorality and Cleansing

Social Psychology ◽

10.1027/1864-9335/a000349 ◽

2018 ◽

Vol 49 (5) ◽

pp. 303-309 ◽

Cited By ~ 3

Author(s):

Jedidiah Siev ◽

Shelby E. Zuckerman ◽

Joseph J. Siev

Keyword(s):

Effect Size ◽

Meta Analysis ◽

Weighted Mean ◽

The Relationship

Abstract. In a widely publicized set of studies, participants who were primed to consider unethical events preferred cleansing products more than did those primed with ethical events ( Zhong & Liljenquist, 2006 ). This tendency to respond to moral threat with physical cleansing is known as the Macbeth Effect. Several subsequent efforts, however, did not replicate this relationship. The present manuscript reports the results of a meta-analysis of 15 studies testing this relationship. The weighted mean effect size was small across all studies (g = 0.17, 95% CI [0.04, 0.31]), and nonsignificant across studies conducted in independent laboratories (g = 0.07, 95% CI [−0.04, 0.19]). We conclude that there is little evidence for an overall Macbeth Effect; however, there may be a Macbeth Effect under certain conditions.

Download Full-text

Career Adaptability and Career Decision Self-Efficacy: Meta-Analysis

Journal of Career Development ◽

10.1177/08948453211012477 ◽

2021 ◽

pp. 089484532110124

Author(s):

Graham B. Stead ◽

Lindsey M. LaVeck ◽

Sandra M. Hurtado Rúa

Keyword(s):

Self Efficacy ◽

Total Population ◽

Meta Analysis ◽

Career Decision ◽

Decision Making Process ◽

Career Adaptability ◽

Efficacy Measures ◽

Meta Analyses ◽

Career Research ◽

The Relationship

The relationship between career adaptability and career decision self-efficacy was examined due to its importance for clients in the career development and career decision-making process. Multivariate meta-analyses using 18 studies with a total population of 6,339 participants were employed. Moderator variables important to this relationship were country of participants, mean age, and career adaptability measures. Estimated correlations between career adaptability subscales and career decision self-efficacy measures ranged from .36 to .44. Findings are discussed in relation to career research and counseling.

Download Full-text

The Relationship Between a Lifetime History of Sexual Victimization and Perinatal Depression: A Systematic Review and Meta-Analysis

Trauma Violence & Abuse ◽

10.1177/15248380211021611 ◽

2021 ◽

pp. 152483802110216

Author(s):

Brooke N. Lombardi ◽

Todd M. Jensen ◽

Anna B. Parisi ◽

Melissa Jenkins ◽

Sarah E. Bledsoe

Keyword(s):

Systematic Review ◽

Effect Size ◽

Sexual Victimization ◽

Perinatal Depression ◽

Meta Analysis ◽

Effect Size Estimate ◽

Lifetime History ◽

Size Estimate ◽

History Of ◽

Size Estimates

Background: The association between a lifetime history of sexual victimization and the well-being of women during the perinatal period has received increasing attention. However, research investigating this relationship has yet to be systematically reviewed or quantitatively synthesized. Aim: This systematic review and meta-analysis aims to calculate the pooled effect size estimate of the statistical association between a lifetime history of sexual victimization and perinatal depression (PND). Method: Four bibliographic databases were systematically searched, and reference harvesting was conducted to identify peer-reviewed articles that empirically examined associations between a lifetime history of sexual victimization and PND. A random effects model was used to ascertain an overall pooled effect size estimate in the form of an odds ratio and corresponding 95% confidence intervals (CIs). Subgroup analyses were also conducted to assess whether particular study features and sample characteristic (e.g., race and ethnicity) influenced the magnitude of effect size estimates. Results: This review included 36 studies, with 45 effect size estimates available for meta-analysis. Women with a lifetime history of sexual victimization had 51% greater odds of experiencing PND relative to women with no history of sexual victimization ( OR = 1.51, 95% CI [1.35, 1.67]). Effect size estimates varied considerably according to the PND instrument used in each study and the racial/ethnic composition of each sample. Conclusion: Findings provide compelling evidence for an association between a lifetime history of sexual victimization and PND. Future research should focus on screening practices and interventions that identify and support survivors of sexual victimization perinatally.

Download Full-text