Content matters: Measures of contextual diversity must consider semantic content
Measures of contextual diversity seek to replace word frequency by counting the number of contexts in which a word occurs rather than the raw number of occurrences (Adelman, Brown, & Quesada, 2006). It has repeatedly been shown that contextual diversity measures outperform word frequency on word recognition datasets (Adelman & Brown, 2008; Brysbaert & New, 2009). Recently, Hollis (2020) has questioned the importance of contextual diversity by demonstrating that when other variables of contextual occurrences are controlled for, diversity accounts for relatively small amounts of unique variance over word frequency. However, the analysis of Hollis (2020) did not take into account the semantic content of the contexts that words occur in. Johns, Dye, and Jones (2020) and Johns (2021) have recently shown that defining linguistic contexts at larger, and more ecologically valid, levels lead to contextual diversity measures that provide very large improvements over word frequency, especially when implemented with principles from the Semantic Distinctiveness Model of Jones, Johns, and Recchia (2012). Across a series of simulations, we demonstrate that the advantages of contextual diversity measures are dependent upon the usage of semantic representations of words to determine the uniqueness of contextual occurrences, where unique contextual occurrences provide a greater impact to a word’s lexical strength than redundant contextual occurrences.