Reliability of Essay Rating and Score Adjustment
A model-based approach to rater reliability for essays read by multiple readers is presented. The approach is motivated by the generalizability theory. Variation of rater severity (between-rater variation) and rater inconsistency (within-rater variation) is considered in the presence of between-examinee variation. An additive variance component model is posited and the method of moments for its estimation described. The models involve no distributional assumptions other than variance homogeneity and independence of certain random variables. Minimum mean squared error estimators of examinees’ true scores and readers’ severities are derived. Model diagnostic procedures are an integral component of the approach. The methods are illustrated on data from standardized educational tests.