Conflicts in Bayesian Statistics Between Inference Based on Credible Intervals and Bayes Factors

Miodrag M. Lovric

doi:10.22237/jmasm/1556670540

Conflicts in Bayesian Statistics Between Inference Based on Credible Intervals and Bayes Factors

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1556670540 ◽

2020 ◽

Vol 18 (1) ◽

pp. 2-27

Author(s):

Miodrag M. Lovric

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Bayes Factor ◽

Credible Interval ◽

Test Point ◽

Type I ◽

Null Hypothesis Testing ◽

Frequentist Statistics ◽

Credible Intervals ◽

Point Null Hypothesis

In frequentist statistics, point-null hypothesis testing based on significance tests and confidence intervals are harmonious procedures and lead to the same conclusion. This is not the case in the domain of the Bayesian framework. An inference made about the point-null hypothesis using Bayes factor may lead to an opposite conclusion if it is based on the Bayesian credible interval. Bayesian suggestions to test point-nulls using credible intervals are misleading and should be dismissed. A null hypothesized value may be outside a credible interval but supported by Bayes factor (a Type I conflict), or contrariwise, the null value may be inside a credible interval but not supported by the Bayes factor (Type II conflict). Two computer programs in R have been developed that confirm the existence of a countable infinite number of cases, for which Bayes credible intervals are not compatible with Bayesian hypothesis testing.

Download Full-text

Interval-Based Hypothesis Testing and Its Applications to Economics and Finance

Econometrics ◽

10.3390/econometrics7020021 ◽

2019 ◽

Vol 7 (2) ◽

pp. 21

Author(s):

Jae H. Kim ◽

Andrew P. Robinson

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Medical Science ◽

Point Of View ◽

Economic Time Series ◽

New Era ◽

Null Hypothesis Testing ◽

Point Null Hypothesis ◽

Economic Time ◽

Linear Restrictions

This paper presents a brief review of interval-based hypothesis testing, widely used in bio-statistics, medical science, and psychology, namely, tests for minimum-effect, equivalence, and non-inferiority. We present the methods in the contexts of a one-sample t-test and a test for linear restrictions in a regression. We present applications in testing for market efficiency, validity of asset-pricing models, and persistence of economic time series. We argue that, from the point of view of economics and finance, interval-based hypothesis testing provides more sensible inferential outcomes than those based on point-null hypothesis. We propose that interval-based tests be routinely employed in empirical research in business, as an alternative to point null hypothesis testing, especially in the new era of big data.

Download Full-text

Statistical Conclusion Validity

10.1093/oso/9780190661557.003.0006 ◽

2017 ◽

Author(s):

Richard McCleary ◽

David McDowall ◽

Bradley J. Bartos

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Hypothesis Test ◽

Model Misspecification ◽

Internal Validity ◽

Error Rates ◽

P Value ◽

Type I ◽

Null Hypothesis Testing ◽

Statistical Conclusion

Chapter 6 addresses the sub-category of internal validity defined by Shadish et al., as statistical conclusion validity, or “validity of inferences about the correlation (covariance) between treatment and outcome.” The common threats to statistical conclusion validity can arise, or become plausible through either model misspecification or through hypothesis testing. The risk of a serious model misspecification is inversely proportional to the length of the time series, for example, and so is the risk of mistating the Type I and Type II error rates. Threats to statistical conclusion validity arise from the classical and modern hybrid significance testing structures, the serious threats that weigh heavily in p-value tests are shown to be undefined in Beyesian tests. While the particularly vexing threats raised by modern null hypothesis testing are resolved through the elimination of the modern null hypothesis test, threats to statistical conclusion validity would inevitably persist and new threats would arise.

Download Full-text

Bayesian point null hypothesis testing via the posterior likelihood ratio

Statistics and Computing ◽

10.1007/s11222-005-1310-0 ◽

2005 ◽

Vol 15 (3) ◽

pp. 217-230 ◽

Cited By ~ 19

Author(s):

Murray Aitkin ◽

Richard J. Boys ◽

Tom Chadwick

Keyword(s):

Hypothesis Testing ◽

Likelihood Ratio ◽

Null Hypothesis ◽

Null Hypothesis Testing ◽

Point Null Hypothesis

Download Full-text

The Principle of Predictive Irrelevance, or Why Intervals Should Not be Used for Model Comparison Featuring a Point Null Hypothesis

10.31234/osf.io/rqnu5 ◽

2019 ◽

Author(s):

Eric-Jan Wagenmakers ◽

Michael David Lee ◽

Jeffrey N. Rouder ◽

Richard Donald Morey

Keyword(s):

Null Hypothesis ◽

Model Comparison ◽

Broad Class ◽

Interval Estimation ◽

Credible Interval ◽

Estimation Methods ◽

Data Set ◽

Credible Intervals ◽

Point Null Hypothesis ◽

Normal Mean

The principle of predictive irrelevance states that when two competing models predict a data set equally well, that data set cannot be used to discriminate the models and --for that specific purpose-- the data set is evidentially irrelevant. To highlight the ramifications of the principle, we first show how a single binomial observation can be irrelevant in the sense that it carries no evidential value for discriminating the null hypothesis $\theta = 1/2$ from a broad class of alternative hypotheses that allow $\theta$ to be between 0 and 1. In contrast, the Bayesian credible interval suggest that a single binomial observation does provide some evidence against the null hypothesis. We then generalize this paradoxical result to infinitely long data sequences that are predictively irrelevant throughout. Examples feature a test of a binomial rate and a test of a normal mean. These maximally uninformative data (MUD) sequences yield credible intervals and confidence intervals that are certain to exclude the point of test as the sequence lengthens. The resolution of this paradox requires the insight that interval estimation methods --and, consequently, p values-- may not be used for model comparison involving a point null hypothesis.

Download Full-text

Impact of Biases in the False-Positive Rate on Null Hypothesis Testing

Epilepsy ◽

10.1201/b10866-19 ◽

2011 ◽

pp. 241-248

Author(s):

Ralph Andrzejak ◽

Daniel Chicharro ◽

Florian Mormann

Keyword(s):

Hypothesis Testing ◽

False Positive ◽

Null Hypothesis ◽

False Positive Rate ◽

Null Hypothesis Testing ◽

Positive Rate

Download Full-text

Null-Hypothesis Testing with the T-Table

Clinical Data Analysis on a Pocket Calculator ◽

10.1007/978-3-319-27104-0_4 ◽

2016 ◽

pp. 19-23

Author(s):

Ton J. Cleophas ◽

Aeilko H. Zwinderman

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Null Hypothesis Testing

Download Full-text

Null-Hypothesis Testing with Graphs

Clinical Data Analysis on a Pocket Calculator ◽

10.1007/978-3-319-27104-0_3 ◽

2016 ◽

pp. 13-18

Author(s):

Ton J. Cleophas ◽

Aeilko H. Zwinderman

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Null Hypothesis Testing

Download Full-text

Bayesian Inference and Testing Any Hypothesis You Can Specify

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245918773087 ◽

2018 ◽

Vol 1 (2) ◽

pp. 281-295 ◽

Cited By ~ 6

Author(s):

Alexander Etz ◽

Julia M. Haaf ◽

Jeffrey N. Rouder ◽

Joachim Vandekerckhove

Keyword(s):

Bayesian Inference ◽

Model Selection ◽

Hypothesis Testing ◽

Likelihood Ratio ◽

Special Form ◽

Null Hypothesis ◽

Bayes Factor ◽

Alternative Hypotheses ◽

Competing Models

Hypothesis testing is a special form of model selection. Once a pair of competing models is fully defined, their definition immediately leads to a measure of how strongly each model supports the data. The ratio of their support is often called the likelihood ratio or the Bayes factor. Critical in the model-selection endeavor is the specification of the models. In the case of hypothesis testing, it is of the greatest importance that the researcher specify exactly what is meant by a “null” hypothesis as well as the alternative to which it is contrasted, and that these are suitable instantiations of theoretical positions. Here, we provide an overview of different instantiations of null and alternative hypotheses that can be useful in practice, but in all cases the inferential procedure is based on the same underlying method of likelihood comparison. An associated app can be found at https://osf.io/mvp53/ . This article is the work of the authors and is reformatted from the original, which was published under a CC-By Attribution 4.0 International license and is available at https://psyarxiv.com/wmf3r/ .

Download Full-text

The Null Hypothesis Testing Controversy in Psychology

Journal of the American Statistical Association ◽

10.1080/01621459.1999.10473888 ◽

1999 ◽

Vol 94 (448) ◽

pp. 1372-1381 ◽

Cited By ~ 69

Author(s):

David H. Krantz

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Null Hypothesis Testing

Download Full-text

Do psychology students interpret null hypothesis significance testing critically?

10.31237/osf.io/9dz8w ◽

2017 ◽

Author(s):

Ivan Flis

Keyword(s):

Hypothesis Testing ◽

Null Hypothesis ◽

Statistical Significance ◽

Psychology Students ◽

Significance Testing ◽

Null Hypothesis Significance Testing ◽

Graduate Studies ◽

Null Hypothesis Testing ◽

Level Of Understanding ◽

Grade Average

The goal of the study was to descriptively analyze the understanding of null hypothesis significance testing among Croatian psychology students considering how it is usually understood in textbooks, which is subject to Bayesian and interpretative criticism. Also, the thesis represents a short overview of the discussions on the meaning of significance testing and how it is taught to students. There were 350 participants from undergraduate and graduate programs at five faculties in Croatia (Zagreb – Centre for Croatian Studies and Faculty of Humanities and Social Sciences, Rijeka, Zadar, Osijek). Another goal was to ascertain if the understanding of null hypothesis testing among psychology students can be predicted by their grades, attitudes and interests. The level of understanding of null hypothesis testing was measured by the Test of statistical significance misinterpretations (NHST test) (Oakes, 1986; Haller and Krauss, 2002). The attitudes toward null hypothesis significance testing were measured by a questionnaire that was constructed for this study. The grades were operationalized as the grade average of courses taken during undergraduate studies, and as a separate grade average of methodological courses taken during undergraduate and graduate studies. The students have shown limited understanding of null hypothesis testing – the percentage of correct answers in the NHST test was not higher than 56% for any of the six items. Croatian students have also shown less understanding on each item when compared to the German students in Haller and Krauss’s (2002) study. None of the variables – general grade average, average in the methodological courses, two variables measuring the attitude toward null hypothesis significance testing, failing at least one methodological course, and the variable of main interest in psychology – were predictive for the odds of answering the items in the NHST test correctly. The conclusion of the study is that education practices in teaching students the meaning and interpretation of null hypothesis significance testing have to be taken under consideration at Croatian psychology departments.

Download Full-text