scholarly journals On the bias and stability of the results of comparative judgment

2021 ◽  
Author(s):  
Elise Anne Victoire Crompvoets ◽  
Anton A. Béguin ◽  
Klaas Sijtsma

Comparative judgment is a method that allows measurement of a competence by comparison of items with other items. In educational measurement, where comparative judgment is becoming an increasingly popular assessment method, items are mostly students’ responses to an assignment or an examination. For assessments using comparative judgment, the Scale Separation Reliability (SSR) is used to estimate the reliability of the measurement. Previous research has shown that the SSR may overestimate reliability when the pairs to be compared are selected with certain adaptive algorithms, when raters use different underlying models/truths, or when the true variance of the item parameters is below one. This research investigated bias and stability of the components of the SSR in relation to the number of comparisons per item to increase understanding of the SSR. We showed that many comparisons are required to obtain an accurate estimate of the item variance, but that the SSR can be useful even when the variance of the items is overestimated. Lastly, we recommend adjusting the general guideline for the required number of comparisons per item to 41 comparisons per item. This recommendation partly depends on the number of items and the true variance in our simulation study and needs further investigation.

2005 ◽  
Vol 26 (4) ◽  
pp. 362-368 ◽  
Author(s):  
Jose Rossello-Urgell ◽  
Alicia Rodriguez-Pla

AbstractObjective:To date, it has not been adequately proven whether the published formulas used to obtain incidence from the prevalence of nosocomial infections provide a good estimate of real incidence. With the hypothesis that within the hospital setting prevalence may be lower than incidence, the aim of this study was to analyze the behavior of point prevalence as it relates to cumulative incidence and duration of infection.Design:Hospital simulation study.Methods:By randomly selecting a sample of infected patients within a specific range of cumulative incidences and infection durations, we constructed a simulated hospital population, allowing us to estimate daily point prevalences and their maximum and minimum values. The association between the different components of stay and cumulative incidence was evaluated to obtain a more accurate estimate of incidence.Results:Prevalence can be lower than, equal to, or higher than the corresponding incidence. For all incidence levels, prevalence was increasing with duration. Between 14 and 20 days of infection duration, prevalence was consistently lower than incidence. Prevalence duration of infection was approximately half the time of the total duration.Conclusions:The existing formulas relating incidence and prevalence can frequently be inadequate. Until a validated system for converting prevalence into incidence is available, we do not believe their use is appropriate.


2021 ◽  
pp. 001316442110172
Author(s):  
James D. Weese ◽  
Ronna C. Turner ◽  
Allison Ames ◽  
Brandon Crawford ◽  
Xinya Liang

A simulation study was conducted to investigate the heuristics of the SIBTEST procedure and how it compares with ETS classification guidelines used with the Mantel–Haenszel procedure. Prior heuristics have been used for nearly 25 years, but they are based on a simulation study that was restricted due to computer limitations and that modeled item parameters from estimates of ACT and ASVAB tests from 1987 and 1984, respectively. Further, suggested heuristics for data fitting a two-parameter logistic model (2PL) have essentially went unused since their original presentation. This simulation study incorporates a wide range of data conditions to recommend heuristics for both 2PL and three-parameter logistic (3PL) data that correspond with ETS’s Mantel–Haenszel heuristics. Levels of agreement between the new SIBTEST heuristics and Mantel–Haenszel heuristics were similar for 2PL data and higher than prior SIBTEST heuristics for 3PL data. The new recommendations provide higher true-positive rates for 2PL data. Conversely, they displayed decreased true-positive rates for 3PL data. False-positive rates, overall, remained below the level of significance for the new heuristics. Unequal group sizes resulted in slightly larger false-positive rates than balanced designs for both prior and new SIBTEST heuristics, with rates less than alpha levels for equal ability distributions and unbalanced designs versus false-positive rates slightly higher than alpha with unequal ability distributions and unbalanced designs.


Author(s):  
Gary H. Farrow ◽  
Andrew E. Potts ◽  
Daniel G. Washington

The Chain Finite Element Analysis of Residual Strength Joint Industry Project (Chain FEARS JIP) aimed to develop guidance for the determination of a rational discard criteria for mooring chains subject to severe pitting corrosion which would otherwise require immediate removal and replacement. Critical to the ability to evaluate the residual fatigue life of a degraded chain, is to have an accurate estimate of the chain in its as-new condition, thereby providing a benchmark for any loss in fatigue life associated with severe corrosion or wear. A non-linear multi-axial Finite Element Analysis (FEA) fatigue assessment method was developed and correlated against available fatigue test data as part of the JIP achieving this critical requirement. The development of this correlated methodology necessitated a review of: • The available mooring chain fatigue test data, to identify the factors influencing chain fatigue life and failure location. • FEA fatigue methodologies currently employed in the industry. • Current Class Rules relating to fatigue estimation. • The influence of material, manufacturing and operational factors on chain fatigue life. It was established that while the linear FEA fatigue method currently employed in the industry does not correlate with the fatigue test data, the non-linear multi-axial FEA fatigue method developed in the JIP afforded good correlation with test data. It was also demonstrated that the magnitude of mean chain tension and inconsistency in proof loading, as a consequence of the inconsistency in Class Minimum Break Load (MBL) specification, and with respect to chain size and the varying material ductility of steel grades, effects fatigue life. The identified inconsistency in the proofing indicates a likely inconsistency in conservatism embodied in the Class Rules fatigue formulation. Consequently it is possible that chains of certain size and grade may have significantly less fatigue life than anticipated by Class. Further work is recommended to establish a more rational proof load specification and to develop an alternative Class Rules fatigue formulation accounting for the identified factors influencing fatigue.


2021 ◽  
Author(s):  
Elise Anne Victoire Crompvoets ◽  
Anton A. Béguin ◽  
Klaas Sijtsma

The method of pairwise comparison has been used in a wide range of contexts. In educational measurement, many pairwise comparisons are required for reliable measurement when the comparisons to be performed are selected using the commonly-used semi-random selection algorithm (SSA). We proposed a Bayesian selection algorithm (BSA) to obtain smaller standard errors of parameter estimates and higher reliability compared with the SSA, and we evaluated the performance of these algorithms in a simulation study. We conclude that 1) the BSA should be preferred to the SSA, 2) the number of comparisons required for reliable measurement depends on the object variance, and 3) the Scale Separation Reliability (SSR) may systematically overestimate reliability even when the SSA is used.


2017 ◽  
Vol 42 (6) ◽  
pp. 428-445 ◽  
Author(s):  
San Verhavert ◽  
Sven De Maeyer ◽  
Vincent Donche ◽  
Liesje Coertjens

Comparative judgment (CJ) is an alternative method for assessing competences based on Thurstone’s law of comparative judgment. Assessors are asked to compare pairs of students work (representations) and judge which one is better on a certain competence. These judgments are analyzed using the Bradly–Terry–Luce model resulting in logit estimates for the representations. In this context, the Scale Separation Reliability (SSR), coming from Rasch modeling, is typically used as reliability measure. But, to the knowledge of the authors, it has never been systematically investigated if the meaning of the SSR can be transferred from Rasch to CJ. As the meaning of the reliability is an important question for both assessment theory and practice, the current study looks into this. A meta-analysis is performed on 26 CJ assessments. For every assessment, split-halves are performed based on assessor. The rank orders of the whole assessment and the halves are correlated and compared with SSR values using Bland–Altman plots. The correlation between the halves of an assessment was compared with the SSR of the whole assessment showing that the SSR is a good measure for split-half reliability. Comparing the SSR of one of the halves with the correlation between the two respective halves showed that the SSR can also be interpreted as an interrater correlation. Regarding SSR as expressing a correlation with the truth, the results are mixed.


2017 ◽  
Vol 35 (3) ◽  
pp. 220-227 ◽  
Author(s):  
Ali Jahanfar ◽  
Mohsen Amirmojahedi ◽  
Bahram Gharabaghi ◽  
Brajesh Dubey ◽  
Edward McBean ◽  
...  

Rapid population growth of major urban centres in many developing countries has created massive landfills with extraordinary heights and steep side-slopes, which are frequently surrounded by illegal low-income residential settlements developed too close to landfills. These extraordinary landfills are facing high risks of catastrophic failure with potentially large numbers of fatalities. This study presents a novel method for risk assessment of landfill slope failure, using probabilistic analysis of potential failure scenarios and associated fatalities. The conceptual framework of the method includes selecting appropriate statistical distributions for the municipal solid waste (MSW) material shear strength and rheological properties for potential failure scenario analysis. The MSW material properties for a given scenario is then used to analyse the probability of slope failure and the resulting run-out length to calculate the potential risk of fatalities. In comparison with existing methods, which are solely based on the probability of slope failure, this method provides a more accurate estimate of the risk of fatalities associated with a given landfill slope failure. The application of the new risk assessment method is demonstrated with a case study for a landfill located within a heavily populated area of New Delhi, India.


Author(s):  
John J. Donovan ◽  
Donald A. Snyder ◽  
Mark L. Rivers

We present a simple expression for the quantitative treatment of interference corrections in x-ray analysis. WDS electron probe analysis of standard reference materials illustrate the success of the technique.For the analytical line of wavelength λ of any element A which lies near or on any characteristic line of another element B, the observed x-ray counts at We use to denote x-ray counts excited by element i in matrix j (u=unknown; s=analytical standard; ŝ=interference standard) at the wavelength of the analytical line of A, λA (Fig. 1). Quantitative analysis of A requires an accurate estimate of These counts can be estimated from the ZAF calculated concentration of B in the unknown C,Bu measured counts at λA in an interference standard of known concentration of B (and containing no A), and ZAF correction parameters for the matrices of both the unknown and the interference standard at It can be shown that:


1967 ◽  
Vol 10 (2) ◽  
pp. 367-372 ◽  
Author(s):  
James D. Miller ◽  
Arthur F. Niemoeller

Results of intelligibility tests on a single patient with a severe discrimination loss for speech are reported. The patient was tested with four different hearing aids and with no aid, and the effects of opportunity for lipreading, background noise, and reverberation were evaluated. The tests appear to allow an accurate estimate of the amount of help to be expected in various situations and show that an aid with good fidelity is clearly superior to the others tested. The destructive effects of background noise and reverberation are demonstrated separately and in combination.


Sign in / Sign up

Export Citation Format

Share Document