Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

2008 ◽  
Vol 22 (1) ◽  
pp. 38-60 ◽  
Author(s):  
Jason L. Meyers ◽  
G. Edward Miller ◽  
Walter D. Way
2019 ◽  
Vol 40 (2) ◽  
pp. 71-81
Author(s):  
Ismael S. Al-Bursan ◽  
Emil O. W. Kirkegaard ◽  
John Fuerst ◽  
Salaheldin Farah Attallah Bakhiet ◽  
Mohammad F. Al Qudah ◽  
...  

Abstract. Sex differences in mathematical ability were examined in a nation-wide sample of 32,346 Jordanian 4th graders (age 9–10 year) on a 40-item mathematics test. Overall, boys were found to perform slightly worse ( d = −0.12) but had slightly more variation in scores ( SD = 1.02 and SD = 0.98 for boys and girls, respectively). However, when results were disaggregated by school type, single-sex versus coed (i.e., coeducational), boys were found to perform better than girls in coed schools ( d = 0.27) but worse across single-sex schools ( d = −0.37). Two-parameter item response theory analysis showed that item difficulty was similar across sexes in the full sample. Item loadings exhibited substantial departure from measurement invariance with respect to boys and girls at single-sex schools, though. For boys and girls at coed schools, both the item difficulty and item loading correlations were highly similar, evincing that measurement invariance largely held in this case. Partially consistent with findings from other countries, a correlation between item difficulty and male advantage was observed, r = .57, such that the relative male advantage increased with increased item difficulty. Complicating interpretation, this association did not replicate within coed schools. Item content, Bloom’s cognitive taxonomy category, and item position showed no relation to sex differences.


2016 ◽  
Vol 41 (2) ◽  
pp. 115-129 ◽  
Author(s):  
Sebastian Weirich ◽  
Martin Hecht ◽  
Christiane Penk ◽  
Alexander Roppelt ◽  
Katrin Böhme

This article examines the interdependency of two context effects that are known to occur regularly in large-scale assessments: item position effects and effects of test-taking effort on the probability of correctly answering an item. A microlongitudinal design was used to measure test-taking effort over the course of a large-scale assessment of 60 min. Two components of test-taking effort were investigated: initial effort and change in effort. Both components of test-taking effort significantly affected the probability to solve an item. In addition, it was found that participants’ current test-taking effort diminished considerably across the course of the test. Furthermore, a substantial linear position effect was found, which indicated that item difficulty increased during the test. This position effect varied considerably across persons. Concerning the interplay of position effects and test-taking effort, it was found that only the change in effort moderates the position effect and that persons differ with respect to this moderation effect. The consequences of these results concerning the reliability and validity of large-scale assessments are discussed.


2020 ◽  
Vol 36 (4) ◽  
pp. 554-562
Author(s):  
Alica Thissen ◽  
Frank M. Spinath ◽  
Nicolas Becker

Abstract. The cube construction task represents a novel format in the assessment of spatial ability through mental cube rotation tasks. Instead of selecting the correct answer from several response options, respondents construct their own response in a computerized test environment, leading to a higher demand for spatial ability. In the present study with a sample of 146 German high-school students, we tested an approach to manipulate the item difficulties in order to create items with a greater difficulty range. Furthermore, we compared the cube task in a distractor-free and a distractor-based version while the item stems were held identical. The average item difficulty of the distractor-free format was significantly higher than in the distractor-based format ( M = 0.27 vs. M = 0.46) and the distractor-free format showed a broader range of item difficulties (.02 ≤  pi ≤ .95 vs. .37 ≤  pi ≤ .63). The analyses of the test results also showed that the distractor-free format had a significantly higher correlation with a broad intelligence test ( r = .57 vs. r = .17). Reasons for the higher convergent validity of the distractor-free format (prevention of response elimination strategies and the broader range of item difficulties) and further research possibilities are discussed.


2012 ◽  
Author(s):  
Victoria Blanshteyn ◽  
Charles A. Scherbaum
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document