Long-distance syntactic dependencies drive the complexity of legal language
Although contracts and other legal documents have long been known to cause processing difficulty in laypeople, the source and nature of this difficulty has remained unclear. To better understand this mismatch, we conducted a corpus analysis (~10 million words) to investigate to what extent difficult-to-process features that are reportedly common in contracts--such as center embedding, low-frequency jargon, passive voice and non-standard capitalization--are in fact present in contracts relative to normal texts. We found that all of these features were strikingly more prevalent in contracts relative to standard-English texts. We also conducted an experimental study ($n=108$ subjects) to determine to what extent such features cause processing difficulties for laypeople of different reading levels. We found that contractual excerpts containing these features were recalled and comprehended at a lower rate than excerpts without these features, even for experienced readers, and that center-embedded clauses led to greater decreases in recall than other features. These findings confirm long-standing anecdotal accounts of the presence of difficult-to-process features in contracts, and show that these features inhibit comprehension and recall of legal content for readers of all levels. Our findings also suggest such difficulties may largely result from working memory costs imposed by complex syntactic features--such as center-embedded clauses--as opposed to a mere lack of understanding of specialized legal concepts, and that removing these features would be both tractable and beneficial for society at large.