scholarly journals Null hypothesis significance testing and effect sizes: can we ‘effect’ everything … or … anything?

2020 ◽  
Vol 51 ◽  
pp. 68-77 ◽  
Author(s):  
David P Lovell
2009 ◽  
Vol 217 (1) ◽  
pp. 15-26 ◽  
Author(s):  
Geoff Cumming ◽  
Fiona Fidler

Most questions across science call for quantitative answers, ideally, a single best estimate plus information about the precision of that estimate. A confidence interval (CI) expresses both efficiently. Early experimental psychologists sought quantitative answers, but for the last half century psychology has been dominated by the nonquantitative, dichotomous thinking of null hypothesis significance testing (NHST). The authors argue that psychology should rejoin mainstream science by asking better questions – those that demand quantitative answers – and using CIs to answer them. They explain CIs and a range of ways to think about them and use them to interpret data, especially by considering CIs as prediction intervals, which provide information about replication. They explain how to calculate CIs on means, proportions, correlations, and standardized effect sizes, and illustrate symmetric and asymmetric CIs. They also argue that information provided by CIs is more useful than that provided by p values, or by values of Killeen’s prep, the probability of replication.


2015 ◽  
Vol 37 (4) ◽  
pp. 449-461 ◽  
Author(s):  
Andreas Ivarsson ◽  
Mark B. Andersen ◽  
Andreas Stenling ◽  
Urban Johnson ◽  
Magnus Lindwall

Null hypothesis significance testing (NHST) is like an immortal horse that some researchers have been trying to beat to death for over 50 years but without any success. In this article we discuss the flaws in NHST, the historical background in relation to both Fisher’s and Neyman and Pearson’s statistical ideas, the common misunderstandings of what p < 05 actually means, and the 2010 APA publication manual’s clear, but most often ignored, instructions to report effect sizes and to interpret what they all mean in the real world. In addition, we discuss how Bayesian statistics can be used to overcome some of the problems with NHST. We then analyze quantitative articles published over the past three years (2012–2014) in two top-rated sport and exercise psychology journals to determine whether we have learned what we should have learned decades ago about our use and meaningful interpretations of statistics.


Author(s):  
Freddy A. Paniagua

Ferguson (2015) observed that the proportion of studies supporting the experimental hypothesis and rejecting the null hypothesis is very high. This paper argues that the reason for this scenario is that researchers in the behavioral sciences have learned that the null hypothesis can always be rejected if one knows the statistical tricks to reject it (e.g., the probability of rejecting the null hypothesis increases with p = 0.05 compare to p = 0.01). Examples of the advancement of science without the need to formulate the null hypothesis are also discussed, as well as alternatives to null hypothesis significance testing-NHST (e.g., effect sizes), and the importance to distinguish the statistical significance from the practical significance of results.  


2010 ◽  
Vol 3 (2) ◽  
pp. 106-112 ◽  
Author(s):  
Matthew J. Rinella ◽  
Jeremy J. James

AbstractNull hypothesis significance testing (NHST) forms the backbone of statistical inference in invasive plant science. Over 95% of research articles in Invasive Plant Science and Management report NHST results such as P-values or statistics closely related to P-values such as least significant differences. Unfortunately, NHST results are less informative than their ubiquity implies. P-values are hard to interpret and are regularly misinterpreted. Also, P-values do not provide estimates of the magnitudes and uncertainties of studied effects, and these effect size estimates are what invasive plant scientists care about most. In this paper, we reanalyze four datasets (two of our own and two of our colleagues; studies put forth as examples in this paper are used with permission of their authors) to illustrate limitations of NHST. The re-analyses are used to build a case for confidence intervals as preferable alternatives to P-values. Confidence intervals indicate effect sizes, and compared to P-values, confidence intervals provide more complete, intuitively appealing information on what data do/do not indicate.


2021 ◽  
pp. 174569162097055
Author(s):  
Nick J. Broers

One particular weakness of psychology that was left implicit by Meehl is the fact that psychological theories tend to be verbal theories, permitting at best ordinal predictions. Such predictions do not enable the high-risk tests that would strengthen our belief in the verisimilitude of theories but instead lead to the practice of null-hypothesis significance testing, a practice Meehl believed to be a major reason for the slow theoretical progress of soft psychology. The rising popularity of meta-analysis has led some to argue that we should move away from significance testing and focus on the size and stability of effects instead. Proponents of this reform assume that a greater emphasis on quantity can help psychology to develop a cumulative body of knowledge. The crucial question in this endeavor is whether the resulting numbers really have theoretical meaning. Psychological science lacks an undisputed, preexisting domain of observations analogous to the observations in the space-time continuum in physics. It is argued that, for this reason, effect sizes do not really exist independently of the adopted research design that led to their manifestation. Consequently, they can have no bearing on the verisimilitude of a theory.


2019 ◽  
Author(s):  
Felipe Romero ◽  
Jan Sprenger

The enduring replication crisis in many scientific disciplines casts doubt on the ability of science to self-correct its findings and to produce reliable knowledge. Amongst a variety of possible methodological, social, and statistical reforms to address the crisis, we focus on replacing null hypothesis significance testing (NHST) with Bayesian inference. On the basis of a simulation study for meta-analytic aggregation of effect sizes, we study the relative advantages of this Bayesian reform, and its interaction with widespread limitations in experimental research. Moving to Bayesian statistics will not solve the replication crisis single-handely, but would eliminate important sources of effect size overestimation for the conditions we study.


2021 ◽  
pp. 104973152110082
Author(s):  
Daniel J. Dunleavy ◽  
Jeffrey R. Lacasse

In this article, we offer a primer on “classical” frequentist statistics. In doing so, we aim to (1) provide social workers with a nuanced overview of common statistical concepts and tools, (2) clarify ways in which these ideas have oft been misused or misinterpreted in research and practice, and (3) help social workers better understand what frequentist statistics can and cannot offer. We begin broadly, starting with foundational issues in the philosophy of statistics. Then, we outline the Fisherian and Neyman–Pearson approaches to statistical inference and the practice of null hypothesis significance testing. We then discuss key statistical concepts including α, power, p values, effect sizes, and confidence intervals, exploring several common misconceptions about their use and interpretation. We close by considering some limitations of frequentist statistics and by offering an opinionated discussion on how social workers may promote more fruitful, responsible, and thoughtful statistical practice.


2017 ◽  
Vol 19 (1) ◽  
pp. 70-80 ◽  
Author(s):  
Michael Perdices

There has been controversy over Null Hypothesis Significance Testing (NHST) since the first quarter of the 20th century and misconceptions about it still abound. The first section of this paper briefly discusses some of the problems and limitations of NHST. Overwhelmingly, the ‘holy grail’ of researchers has been to obtain significantp-values. In 1999 the American Psychological Association (APA) recommended that if NHST was used in data analysis, then researchers should report effect sizes (ESs) and their confident intervals (CIs) as well asp-values. The APA recommendations are summarised in the next section of the paper. But as neuropsychological rehabilitation clinicians, the primary interest is (or should be) to determine whether or not the effect of an intervention is clinically important, not just statistically significant. In this context, ESs and their CIs provide information relevant to clinicians. The next section of the paper reviews common ESs and worked out examples are provided for the calculation of three commonly used ES (Cohen'sd, Hedge'sgand Glass’delta). Web-based resources for calculating other ESs and their CIs are also reviewed.


2021 ◽  
Author(s):  
Nick J. Broers

One particular weakness of psychology that was left implicit by Meehl (1978) is the fact that psychological theories tend to be verbal theories, permitting at best ordinal predictions. Such predictions do not enable the high risk tests that would strengthen our belief in the verisimilitude of theories but instead lead to the practice of null hypothesis significance testing, a practice Meehl believed to be a major reason for the slow theoretical progress of soft psychology. The rising popularity of meta-analysis has led some to argue that we should move away from significance testing and focus on the size and stability of effects instead. Proponents of this reform assume that a greater emphasis on quantity can help psychology to develop a cumulative body of knowledge. The crucial question in this endeavor is whether the resulting numbers really have theoretical meaning. Psychological science lacks an undisputed, pre-existing domain of observations analogous to the observations in the space-time continuum in physics. It is argued that for this reason effect sizes do not really exist independently of the adopted research design that led to their manifestation. Consequently, they can have no bearing on the verisimilitude of a theory.


1999 ◽  
Vol 8 (5) ◽  
pp. 291-296 ◽  
Author(s):  
DN Glaser

The current debate about the merits of null hypothesis significance testing, even though provocative, is not particularly novel. The significance testing approach has had defenders and opponents for decades, especially within the social sciences, where reliance on the use of significance testing has historically been heavy. The primary concerns have been (1) the misuse of significance testing, (2) the misinterpretation of P values, and (3) the lack of accompanying statistics, such as effect sizes and confidence intervals, that would provide a broader picture into the researcher's data analysis and interpretation. This article presents the current thinking, both in favor and against, on significance testing, the virtually unanimous support for reporting effect sizes alongside P values, and the overall implications for practice and application.


Sign in / Sign up

Export Citation Format

Share Document