We clarify fundamental aspects of end-user elicitation, enabling such studies to be run and analyzed with confidence, correctness, and scientific rigor. To this end, our contributions are multifold. We introduce a formal model of end-user elicitation in HCI and identify three types of agreement analysis:
expert
,
codebook
, and
computer
. We show that agreement is a mathematical
tolerance relation
generating a tolerance space over the set of elicited proposals. We review current measures of agreement and show that all can be computed from an
agreement graph
. In response to recent criticisms, we show that chance agreement represents an issue solely for inter-rater reliability studies and not for end-user elicitation, where it is opposed by
chance disagreement
. We conduct extensive simulations of 16 statistical tests for agreement rates, and report Type I errors and power. Based on our findings, we provide recommendations for practitioners and introduce a five-level hierarchy for elicitation studies.