Clarifying Agreement Calculations and Analysis for End-User Elicitation Studies
We clarify fundamental aspects of end-user elicitation, enabling such studies to be run and analyzed with confidence, correctness, and scientific rigor. To this end, our contributions are multifold. We introduce a formal model of end-user elicitation in HCI and identify three types of agreement analysis: expert , codebook , and computer . We show that agreement is a mathematical tolerance relation generating a tolerance space over the set of elicited proposals. We review current measures of agreement and show that all can be computed from an agreement graph . In response to recent criticisms, we show that chance agreement represents an issue solely for inter-rater reliability studies and not for end-user elicitation, where it is opposed by chance disagreement . We conduct extensive simulations of 16 statistical tests for agreement rates, and report Type I errors and power. Based on our findings, we provide recommendations for practitioners and introduce a five-level hierarchy for elicitation studies.