Clarifying Agreement Calculations and Analysis for End-User Elicitation Studies

2022 ◽  
Vol 29 (1) ◽  
pp. 1-70
Author(s):  
Radu-Daniel Vatavu ◽  
Jacob O. Wobbrock

We clarify fundamental aspects of end-user elicitation, enabling such studies to be run and analyzed with confidence, correctness, and scientific rigor. To this end, our contributions are multifold. We introduce a formal model of end-user elicitation in HCI and identify three types of agreement analysis: expert , codebook , and computer . We show that agreement is a mathematical tolerance relation generating a tolerance space over the set of elicited proposals. We review current measures of agreement and show that all can be computed from an agreement graph . In response to recent criticisms, we show that chance agreement represents an issue solely for inter-rater reliability studies and not for end-user elicitation, where it is opposed by chance disagreement . We conduct extensive simulations of 16 statistical tests for agreement rates, and report Type I errors and power. Based on our findings, we provide recommendations for practitioners and introduce a five-level hierarchy for elicitation studies.

2015 ◽  
Vol 2015 ◽  
pp. 1-5
Author(s):  
Wararit Panichkitkosolkul

An asymptotic test and an approximate test for the reciprocal of a normal mean with a known coefficient of variation were proposed in this paper. The asymptotic test was based on the expectation and variance of the estimator of the reciprocal of a normal mean. The approximate test used the approximate expectation and variance of the estimator by Taylor series expansion. A Monte Carlo simulation study was conducted to compare the performance of the two statistical tests. Simulation results showed that the two proposed tests performed well in terms of empirical type I errors and power. Nevertheless, the approximate test was easier to compute than the asymptotic test.


2021 ◽  
Vol 17 (12) ◽  
pp. e1009036
Author(s):  
Jack Kuipers ◽  
Ariane L. Moore ◽  
Katharina Jahn ◽  
Peter Schraml ◽  
Feng Wang ◽  
...  

Tumour progression is an evolutionary process in which different clones evolve over time, leading to intra-tumour heterogeneity. Interactions between clones can affect tumour evolution and hence disease progression and treatment outcome. Intra-tumoural pairs of mutations that are overrepresented in a co-occurring or clonally exclusive fashion over a cohort of patient samples may be suggestive of a synergistic effect between the different clones carrying these mutations. We therefore developed a novel statistical testing framework, called GeneAccord, to identify such gene pairs that are altered in distinct subclones of the same tumour. We analysed our framework for calibration and power. By comparing its performance to baseline methods, we demonstrate that to control type I errors, it is essential to account for the evolutionary dependencies among clones. In applying GeneAccord to the single-cell sequencing of a cohort of 123 acute myeloid leukaemia patients, we find 1 clonally co-occurring and 8 clonally exclusive gene pairs. The clonally exclusive pairs mostly involve genes of the key signalling pathways.


2015 ◽  
Vol 23 (2) ◽  
pp. 306-312 ◽  
Author(s):  
Annie Franco ◽  
Neil Malhotra ◽  
Gabor Simonovits

The accuracy of published findings is compromised when researchers fail to report and adjust for multiple testing. Preregistration of studies and the requirement of preanalysis plans for publication are two proposed solutions to combat this problem. Some have raised concerns that such changes in research practice may hinder inductive learning. However, without knowing the extent of underreporting, it is difficult to assess the costs and benefits of institutional reforms. This paper examines published survey experiments conducted as part of the Time-sharing Experiments in the Social Sciences program, where the questionnaires are made publicly available, allowing us to compare planned design features against what is reported in published research. We find that: (1) 30% of papers report fewer experimental conditions in the published paper than in the questionnaire; (2) roughly 60% of papers report fewer outcome variables than what are listed in the questionnaire; and (3) about 80% of papers fail to report all experimental conditions and outcomes. These findings suggest that published statistical tests understate the probability of type I errors.


1978 ◽  
Vol 46 (1) ◽  
pp. 211-218
Author(s):  
Louis M. Hsu

The problem of controlling the risk of occurrence of at least one Type I Error in a family of n statistical tests has been discussed extensively in psychological literature. However, the more general problem of controlling the probability of occurrence of more than some maximum (not necessarily zero) tolerable number ( xm) of Type I Errors in such a family appears to have received little attention. The present paper presents a simple Poisson approximation to the significance level P( EI) which should be used per test, to achieve this goal, in a family of n independent tests. The cases of equal and unequal significance levels for the n tests are discussed. Relative merits and limitations of the Poisson and Bonferroni methods of controlling the number of Type I Errors are examined, and application of the Poisson method to tests of orthogonal contrasts in analysis of variance, multiple tests of hypotheses in single studies, and multiple tests of hypotheses in literature reviews, are discussed.


2021 ◽  
Author(s):  
Quentin André

When researchers choose to identify and exclude outliers from their data, should they do so across all the data, or within experimental conditions? A survey of recent papers published in the Journal of Experimental Psychology: General shows that both methods are widely used, and common data visualization techniques suggest that outliers should be excluded at the condition-level. However, I highlight in the present paper that removing outliers by condition runs against the logic of hypothesis testing, and that this practice leads to unacceptable increases in false-positive rates. I demonstrate that this conclusion holds true across a variety of statistical tests, exclusion criterion and cutoffs, sample sizes, and data types, and show in simulated experiments and in a re-analysis of existing data that by-condition exclusions can result in false-positive rates as high as 43%. I finally demonstrate that by-condition exclusions are a specific case of a more general issue: Any outlier exclusion procedure that is not blind to the hypothesis that researchers want to test may result in inflated Type I errors. I conclude by offering best practices and recommendations for excluding outliers.


Methodology ◽  
2015 ◽  
Vol 11 (3) ◽  
pp. 110-115 ◽  
Author(s):  
Rand R. Wilcox ◽  
Jinxia Ma

Abstract. The paper compares methods that allow both within group and between group heteroscedasticity when performing all pairwise comparisons of the least squares lines associated with J independent groups. The methods are based on simple extension of results derived by Johansen (1980) and Welch (1938) in conjunction with the HC3 and HC4 estimators. The probability of one or more Type I errors is controlled using the improvement on the Bonferroni method derived by Hochberg (1988) . Results are illustrated using data from the Well Elderly 2 study, which motivated this paper.


2020 ◽  
Vol 39 (3) ◽  
pp. 185-208
Author(s):  
Qiao Xu ◽  
Rachana Kalelkar

SUMMARY This paper examines whether inaccurate going-concern opinions negatively affect the audit office's reputation. Assuming that clients perceive the incidence of going-concern opinion errors as a systematic audit quality concern within the entire audit office, we expect these inaccuracies to impact the audit office market share and dismissal rate. We find that going-concern opinion inaccuracy is negatively associated with the audit office market share and is positively associated with the audit office dismissal rate. Furthermore, we find that the decline in market share and the increase in dismissal rate are primarily associated with Type I errors. Additional analyses reveal that the negative consequence of going-concern opinion inaccuracy is lower for Big 4 audit offices. Finally, we find that the decrease in the audit office market share is explained by the distressed clients' reactions to Type I errors and audit offices' lack of ability to attract new clients.


2002 ◽  
Vol 55 (1) ◽  
pp. 27-39 ◽  
Author(s):  
H.J. Keselman ◽  
Robert Cribbie ◽  
Burt Holland

2018 ◽  
Vol 7 (10) ◽  
pp. 409 ◽  
Author(s):  
Youqiang Dong ◽  
Ximin Cui ◽  
Li Zhang ◽  
Haibin Ai

The progressive TIN (triangular irregular network) densification (PTD) filter algorithm is widely used for filtering point clouds. In the PTD algorithm, the iterative densification parameters become smaller over the entire process of filtering. This leads to the performance—especially the type I errors of the PTD algorithm—being poor for point clouds with high density and standard variance. Hence, an improved PTD filtering algorithm for point clouds with high density and variance is proposed in this paper. This improved PTD method divides the iterative densification process into two stages. In the first stage, the iterative densification process of the PTD algorithm is used, and the two densification parameters become smaller. When the density of points belonging to the TIN is higher than a certain value (in this paper, we define this density as the standard variance intervention density), the iterative densification process moves into the second stage. In the second stage, a new iterative densification strategy based on multi-scales is proposed, and the angle threshold becomes larger. The experimental results show that the improved PTD algorithm can effectively reduce the type I errors and total errors of the DIM point clouds by 7.53% and 4.09%, respectively, compared with the PTD algorithm. Although the type II errors increase slightly in our improved method, the wrongly added objective points have little effect on the accuracy of the generated DSM. In short, our improved PTD method perfects the classical PTD method and offers a better solution for filtering point clouds with high density and standard variance.


Sign in / Sign up

Export Citation Format

Share Document