scholarly journals Determining sexual dimorphism in frog measurement data: integration of statistical significance, measurement error, effect size and biological significance

2005 ◽  
Vol 77 (1) ◽  
pp. 45-76 ◽  
Author(s):  
Lee-Ann C. Hayek ◽  
W. Ronald Heyer

Several analytic techniques have been used to determine sexual dimorphism in vertebrate morphological measurement data with no emergent consensus on which technique is superior. A further confounding problem for frog data is the existence of considerable measurement error. To determine dimorphism, we examine a single hypothesis (Ho = equal means) for two groups (females and males). We demonstrate that frog measurement data meet assumptions for clearly defined statistical hypothesis testing with statistical linear models rather than those of exploratory multivariate techniques such as principal components, correlation or correspondence analysis. In order to distinguish biological from statistical significance of hypotheses, we propose a new protocol that incorporates measurement error and effect size. Measurement error is evaluated with a novel measurement error index. Effect size, widely used in the behavioral sciences and in meta-analysis studies in biology, proves to be the most useful single metric to evaluate whether statistically significant results are biologically meaningful. Definitions for a range of small, medium, and large effect sizes specifically for frog measurement data are provided. Examples with measurement data for species of the frog genus Leptodactylus are presented. The new protocol is recommended not only to evaluate sexual dimorphism for frog data but for any animal measurement data for which the measurement error index and observed or a priori effect sizes can be calculated.

Author(s):  
H. S. Styn ◽  
S. M. Ellis

The determination of significance of differences in means and of relationships between variables is of importance in many empirical studies. Usually only statistical significance is reported, which does not necessarily indicate an important (practically significant) difference or relationship. With studies based on probability samples, effect size indices should be reported in addition to statistical significance tests in order to comment on practical significance. Where complete populations or convenience samples are worked with, the determination of statistical significance is strictly speaking no longer relevant, while the effect size indices can be used as a basis to judge significance. In this article attention is paid to the use of effect size indices in order to establish practical significance. It is also shown how these indices are utilized in a few fields of statistical application and how it receives attention in statistical literature and computer packages. The use of effect sizes is illustrated by a few examples from the research literature.


1990 ◽  
Vol 24 (3) ◽  
pp. 405-415 ◽  
Author(s):  
Nathaniel McConaghy

Meta-analysis replaced statistical significance with effect size in the hope of resolving controversy concerning evaluation of treatment effects. Statistical significance measured reliability of the effect of treatment, not its efficacy. It was strongly influenced by the number of subjects investigated. Effect size as assessed originally, eliminated this influence but by standardizing the size of the treatment effect could distort it. Meta-analyses which combine the results of studies which employ different subject types, outcome measures, treatment aims, no-treatment rather than placebo controls or therapists with varying experience can be misleading. To ensure discussion of these variables meta-analyses should be used as an aid rather than a substitute for literature review. While meta-analyses produce contradictory findings, it seems unwise to rely on the conclusions of an individual analysis. Their consistent finding that placebo treatments obtain markedly higher effect sizes than no treatment hopefully will render the use of untreated control groups obsolete.


2012 ◽  
Vol 33 (2) ◽  
pp. 171-183 ◽  
Author(s):  
Anna Rita Di Cerbo ◽  
Carlo M. Biancardi

In this study, we explored the level and pattern of sexual size dimorphism and sexual shape dimorphism in two closely related Bombina species that have low levels of sexual dimorphism in body size and shape. We applied an experimental protocol to explore sexual variations in morphological traits, including a preliminary evaluation of the measurement error. Mean measurement error (MME) and measurement error index (MEI) were estimated on each of the eleven morphometric variables to exclude any possible subjective factor in measuring and to perform, for the first time, an objective functional and statistical evaluation of sexual size differences in the two species. Even if statistically significant, each difference that lies below the level of uncertainty of the measure could not be reliable. Therefore, statistically significant differences in head shape have been rejected, due to an average difference between males and females smaller than the possible MME. We detected significantly longer distal segments of the hind limbs in males, which could account for their use in mating behaviour (e.g. scramble competition, water-wave communication). However, major and more reliable evidences of sexual dimorphism have been found on forelimb measures (MEI > 1), in particular humerus length and amplexus, which are significantly larger in males than in females. These results indicate a mating related sexual dimorphism, when larger and stronger forelimbs can give an advantage during coupling as well as during male-male fighting. The mean measurement error values and formulas provided in this work could be applied to future morphometric studies on Bombina species.


2019 ◽  
Author(s):  
Miguel Alejandro Silan

One of the main criticisms of NHST is that statistical significance is not practical significance. And this evaluation of the practical significance of effects often take an implicit but consequential form in the field: from informal conversations among researchers when evaluating findings, to peer reviewers deciding the importance of an article. This primer seeks to make explicit what we mean when we talk about practical significance, organize what we know of it, and assert a framework for how we can evaluate and establish it. The practical significance of effects is appraised by analyzing (i.) along different levels of analysis, (ii.) across different outcomes, (iii.) across time and (iv.) across relevant moderators; which also underlie the conditions of when small effect sizes can be consequential. Practical significance is contrasted with often conflated terms including statistical significance, effect size and effect size benchmarks as well as theoretical significance. Promising directions are then presented.


2021 ◽  
Author(s):  
Kleber Neves ◽  
Pedro Batista Tan ◽  
Olavo Bohrer Amaral

Diagnostic screening models for the interpretation of null hypothesis significance test (NHST) results have been influential in highlighting the effect of selective publication on the reproducibility of the published literature, leading to John Ioannidis’ much-cited claim that most published research findings are false. These models, however, are typically based on the assumption that hypotheses are dichotomously true or false, without considering that effect sizes for different hypotheses are not the same. To address this limitation, we develop a simulation model that overcomes this by modeling effect sizes explicitly using different continuous distributions, while retaining other aspects of previous models such as publication bias and the pursuit of statistical significance. Our results show that the combination of selective publication, bias, low statistical power and unlikely hypotheses consistently leads to high proportions of false positives, irrespective of the effect size distribution assumed. Using continuous effect sizes also allows us to evaluate the degree of effect size overestimation and prevalence of estimates with the wrong signal in the literature, showing that the same factors that drive false-positive results also lead to errors in estimating effect size direction and magnitude. Nevertheless, the relative influence of these factors on different metrics varies depending on the distribution assumed for effect sizes. The model is made available as an R ShinyApp interface, allowing one to explore features of the literature in various scenarios.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257535
Author(s):  
Max M. Owens ◽  
Alexandra Potter ◽  
Courtland S. Hyatt ◽  
Matthew Albaugh ◽  
Wesley K. Thompson ◽  
...  

Effect sizes are commonly interpreted using heuristics established by Cohen (e.g., small: r = .1, medium r = .3, large r = .5), despite mounting evidence that these guidelines are mis-calibrated to the effects typically found in psychological research. This study’s aims were to 1) describe the distribution of effect sizes across multiple instruments, 2) consider factors qualifying the effect size distribution, and 3) identify examples as benchmarks for various effect sizes. For aim one, effect size distributions were illustrated from a large, diverse sample of 9/10-year-old children. This was done by conducting Pearson’s correlations among 161 variables representing constructs from all questionnaires and tasks from the Adolescent Brain and Cognitive Development Study® baseline data. To achieve aim two, factors qualifying this distribution were tested by comparing the distributions of effect size among various modifications of the aim one analyses. These modified analytic strategies included comparisons of effect size distributions for different types of variables, for analyses using statistical thresholds, and for analyses using several covariate strategies. In aim one analyses, the median in-sample effect size was .03, and values at the first and third quartiles were .01 and .07. In aim two analyses, effects were smaller for associations across instruments, content domains, and reporters, as well as when covarying for sociodemographic factors. Effect sizes were larger when thresholding for statistical significance. In analyses intended to mimic conditions used in “real-world” analysis of ABCD data, the median in-sample effect size was .05, and values at the first and third quartiles were .03 and .09. To achieve aim three, examples for varying effect sizes are reported from the ABCD dataset as benchmarks for future work in the dataset. In summary, this report finds that empirically determined effect sizes from a notably large dataset are smaller than would be expected based on existing heuristics.


2016 ◽  
Vol 51 (12) ◽  
pp. 1045-1048 ◽  
Author(s):  
Monica Lininger ◽  
Bryan L. Riemann

Objective: To describe confidence intervals (CIs) and effect sizes and provide practical examples to assist clinicians in assessing clinical meaningfulness. Background: As discussed in our first article in 2015, which addressed the difference between statistical significance and clinical meaningfulness, evaluating the clinical meaningfulness of a research study remains a challenge to many readers. In this paper, we will build on this topic by examining CIs and effect sizes. Description: A CI is a range estimated from sample data (the data we collect) that is likely to include the population parameter (value) of interest. Conceptually, this constitutes the lower and upper limits of the sample data, which would likely include, for example, the mean from the unknown population. An effect size is the magnitude of difference between 2 means. When a statistically significant difference exists between 2 means, effect size is used to describe how large or small that difference actually is. Confidence intervals and effect sizes enhance the practical interpretation of research results. Recommendations: Along with statistical significance, the CI and effect size can assist practitioners in better understanding the clinical meaningfulness of a research study.


2019 ◽  
Vol 28 (4) ◽  
pp. 468-485 ◽  
Author(s):  
Paul HP Hanel ◽  
David MA Mehler

Transparent communication of research is key to foster understanding within and beyond the scientific community. An increased focus on reporting effect sizes in addition to p value–based significance statements or Bayes Factors may improve scientific communication with the general public. Across three studies ( N = 652), we compared subjective informativeness ratings for five effect sizes, Bayes Factor, and commonly used significance statements. Results showed that Cohen’s U3 was rated as most informative. For example, 440 participants (69%) found U3 more informative than Cohen’s d, while 95 (15%) found d more informative than U3, with 99 participants (16%) finding both effect sizes equally informative. This effect was not moderated by level of education. We therefore suggest that in general, Cohen’s U3 is used when scientific findings are communicated. However, the choice of the effect size may vary depending on what a researcher wants to highlight (e.g. differences or similarities).


2017 ◽  
Author(s):  
C Patrick Doncaster ◽  
Thomas H G Ezard

Statistical significance provides evidence for or against an explanation of a population of interest, not a description of data sampled from the population. This simple distinction gets ignored in hundreds of thousands of research publications yearly, which confuse statistical with biological significance by referring to hypothesis-testing analyses as demonstrating significant results. Here we identify three key steps to objective reporting of evidence-based analyses. Firstly, by interpreting P -values correctly as explanation not description, authors set their inferences in the context of the design of the study and its purpose to test for effects of biologically relevant size; nowhere in this process is it informative to use the word ‘significant’. Secondly, empirical effect sizes demand interpretation with respect to a size of relevance to the test hypothesis. Thirdly, even without an a priori expectation of biological relevance, authors can and should interpret significance tests with respect to effects of reliably detectable size.


1998 ◽  
Vol 21 (2) ◽  
pp. 169-194 ◽  
Author(s):  
Siu L. Chow

The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.


Sign in / Sign up

Export Citation Format

Share Document