measures of agreement Latest Research Papers

Clarifying Agreement Calculations and Analysis for End-User Elicitation Studies

ACM Transactions on Computer-Human Interaction ◽

10.1145/3476101 ◽

2022 ◽

Vol 29 (1) ◽

pp. 1-70

Author(s):

Radu-Daniel Vatavu ◽

Jacob O. Wobbrock

Keyword(s):

Formal Model ◽

Statistical Tests ◽

Type I ◽

End User ◽

Type I Errors ◽

Scientific Rigor ◽

Tolerance Relation ◽

Measures Of Agreement ◽

Elicitation Studies ◽

Review Current

We clarify fundamental aspects of end-user elicitation, enabling such studies to be run and analyzed with confidence, correctness, and scientific rigor. To this end, our contributions are multifold. We introduce a formal model of end-user elicitation in HCI and identify three types of agreement analysis: expert , codebook , and computer . We show that agreement is a mathematical tolerance relation generating a tolerance space over the set of elicited proposals. We review current measures of agreement and show that all can be computed from an agreement graph . In response to recent criticisms, we show that chance agreement represents an issue solely for inter-rater reliability studies and not for end-user elicitation, where it is opposed by chance disagreement . We conduct extensive simulations of 16 statistical tests for agreement rates, and report Type I errors and power. Based on our findings, we provide recommendations for practitioners and introduce a five-level hierarchy for elicitation studies.

Download Full-text

Quantification of stroke volume in a simulated healthy volunteer model of traumatic haemorrhage; a comparison of two non-invasive monitoring devices using error grid analysis alongside traditional measures of agreement

PLoS ONE ◽

10.1371/journal.pone.0261546 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0261546

Author(s):

Sam D. Hutchings ◽

Jim Watchorn ◽

Rory McDonald ◽

Su Jeffreys ◽

Mark Bates ◽

...

Keyword(s):

Stroke Volume ◽

Lower Limb ◽

Measurement Errors ◽

Dynamic Change ◽

Traumatic Injury ◽

Haemodynamic Monitoring ◽

Percentage Error ◽

Circulating Blood Volume ◽

Measures Of Agreement ◽

Error Grid

Introduction Haemorrhage is a leading cause of death following traumatic injury and the early detection of hypovolaemia is critical to effective management. However, accurate assessment of circulating blood volume is challenging when using traditional vital signs such as blood pressure. We conducted a study to compare the stroke volume (SV) recorded using two devices, trans-thoracic electrical bioimpedance (TEB) and supra-sternal Doppler (SSD), against a reference standard using trans- thoracic echocardiography (TTE). Methods A lower body negative pressure (LBNP) model was used to simulate hypovolaemia and in half of the study sessions lower limb tourniquets were applied as these are common in military practice and can potentially affect some haemodynamic monitoring systems. In order to provide a clinically relevant comparison we constructed an error grid alongside more traditional measures of agreement. Results 21 healthy volunteers aged 18–40 were enrolled and underwent 2 sessions of LBNP, with and without lower limb tourniquets. With respect to absolute SV values Bland Altman analysis showed significant bias in both non-tourniquet and tourniquet strands for TEB (-42.5 / -49.6 ml), rendering further analysis impossible. For SSD bias was minimal but percentage error was unacceptably high (35% / 48%). Degree of agreement for dynamic change in SV, assessed using 4 quadrant plots showed a seemingly acceptable concordance rate for both TEB (86% / 93%) and SSD (90% / 91%). However, when results were plotted on an error grid, constructed based on expert clinical opinion, a significant minority of measurement errors were identified that had potential to lead to moderate or severe patient harm. Conclusion Thoracic bioimpedance and suprasternal Doppler both demonstrated measurement errors that had the potential to lead to clinical harm and caution should be applied in interpreting the results in the detection of early hypovolaemia following traumatic injury.

Download Full-text

First and second morning spot urine protein measurements for the assessment of proteinuria: a diagnostic accuracy study in kidney transplant recipients

BMC Nephrology ◽

10.1186/s12882-021-02406-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Maja Mrevlje ◽

Manca Oblak ◽

Gregor Mlinšek ◽

Jelka Lindič ◽

Jadranka-Buturović-Ponikvar ◽

...

Keyword(s):

Kidney Transplant ◽

Cross Sectional Study ◽

Low Grade ◽

Transplant Recipients ◽

Kidney Transplant Recipients ◽

Urine Protein ◽

Cross Sectional ◽

Spot Urine ◽

Protein Excretion ◽

Measures Of Agreement

Abstract Background Quantification of proteinuria in kidney transplant recipients is important for diagnostic and prognostic purposes. Apart from correlation tests, there have been few evaluations of spot urine protein measurements in kidney transplantation. Methods In this cross-sectional study involving 151 transplanted patients, we investigated measures of agreement (bias and accuracy) between the estimated protein excretion rate (ePER), determined from the protein-to-creatinine ratio in the first and second morning urine, and 24-h proteinuria and studied their performance at different levels of proteinuria. Measures of agreement were reanalyzed in relation to allograft histology in 76 patients with kidney biopsies performed for cause before enrolment in the study. Results For ePER in the first morning urine, percent bias ranged from 1 to 28% and accuracy (within 30% of 24-h collection) ranged from 56 to 73%. For the second morning urine, percent bias ranged from 2 to 11%, and accuracy ranged from 71 to 78%. The accuracy of ePER (within 30%) in first and second morning urine progressively increased from 56 and 71% for low-grade proteinuria (150–299 mg/day) to 60 and 74% for moderate proteinuria (300–999 mg/day), and to 73 and 78% for high-grade proteinuria (≥1000 mg/day). Measures of agreement were similar across histologic phenotypes of allograft injury. Conclusions The ability of ePER to accurately predict 24-h proteinuria in kidney transplant recipients is modest. However, accuracy improves with an increase in proteinuria. Given the similar accuracy of ePER measurements in first and second morning urine, second morning urine can be used to monitor protein excretion.

Download Full-text

First and Second Morning Spot Urine Protein Measurements for the Assessment of Proteinuria: A Diagnostic Accuracy Study in Kidney Transplant Recipients

10.21203/rs.3.rs-350291/v1 ◽

2021 ◽

Author(s):

Maja Mrevlje ◽

Manca Oblak ◽

Gregor Mlinšek ◽

Jadranka Buturović-Ponikvar ◽

Jelka Lindič ◽

...

Keyword(s):

Kidney Transplant ◽

Cross Sectional Study ◽

Low Grade ◽

Transplant Recipients ◽

Kidney Transplant Recipients ◽

Urine Protein ◽

Cross Sectional ◽

Spot Urine ◽

Protein Excretion ◽

Measures Of Agreement

Abstract Background. Quantification of proteinuria in kidney transplant recipients is important for diagnostic and prognostic purposes. Apart from correlation tests, there have been few evaluations of spot urine protein measurements in kidney transplantation.Methods. In this cross-sectional study involving 151 transplanted patients, we investigated measures of agreement (bias and accuracy) between the estimated protein excretion rate (ePER), determined from the protein-to-creatinine ratio in the first and second morning urine, and 24-hour proteinuria and studied their performance at different levels of proteinuria. Measures of agreement were reanalyzed in relation to allograft histology in 76 patients with kidney biopsies performed for cause before enrolment in the study.Results. For ePER in the first morning urine, percent bias ranged from 1% to 28% and accuracy (within 30% of 24-hour collection) ranged from 56% to 73%. For the second morning urine, percent bias ranged from 2% to 11%, and accuracy ranged from 71% to 78%. The accuracy of ePER (within 30%) in first and second morning urine progressively increased from 56% and 71% for low-grade proteinuria (150-299 mg/day) to 60% and 74% for moderate proteinuria (300-999 mg/day), and to 73% and 78% for high-grade proteinuria (≥1000 mg/day). Measures of agreement were similar across histologic phenotypes of allograft injury.Conclusions. The ability of ePER to accurately predict 24-hour proteinuria in kidney transplant recipients is modest. However, accuracy improves with an increase in proteinuria. Given the similar accuracy of ePER measurements in first and second morning urine, second morning urine can be used to monitor protein excretion.

Download Full-text

Abilities of Canine Shelter Behavioral Evaluations and Owner Surrender Profiles to Predict Resource Guarding in Adoptive Homes

Animals ◽

10.3390/ani10091702 ◽

2020 ◽

Vol 10 (9) ◽

pp. 1702

Author(s):

Betty McGuire ◽

Destiny Orantes ◽

Stephanie Xue ◽

Stephen Parry

Keyword(s):

United States ◽

New York ◽

The United States ◽

Sources Of Information ◽

Predictive Values ◽

Measures Of Agreement ◽

Source Of Information

Some shelters in the United States consider dogs identified as food aggressive during behavioral evaluations to be unadoptable. We surveyed adopters of dogs from a New York shelter to examine predictive abilities of shelter behavioral evaluations and owner surrender profiles. Twenty of 139 dogs (14.4%) were assessed as resource guarding in the shelter. We found statistically significant associations between shelter assessment as resource guarding and guarding reported in the adoptive home for three situations: taking away toys, bones or other valued objects; taking away food; and retrieving items or food taken by the dog. Similarly, owner descriptions of resource guarding on surrender profiles significantly predicted guarding in adoptive homes. However, positive predictive values for all analyses were low, and more than half of dogs assessed as resource guarding either in the shelter or by surrendering owners did not show guarding post adoption. All three sources of information regarding resource guarding status (surrender profile, shelter behavioral evaluation, and adopter report) were available for 44 dogs; measures of agreement were in the fair range. Thus, reports of resource guarding by surrendering owners and detection of guarding during shelter behavioral evaluations should be interpreted with caution because neither source of information consistently signaled guarding would occur in adoptive homes.

Download Full-text

Detection of Opinion Communities with the Help of Chance-Corrected Measures of Agreement

SN Computer Science ◽

10.1007/s42979-020-00129-8 ◽

2020 ◽

Vol 1 (3) ◽

Author(s):

Anton Oleinik

Keyword(s):

Measures Of Agreement

Download Full-text

Affinity-based measures of biomarker performance evaluation

Statistical Methods in Medical Research ◽

10.1177/0962280219846157 ◽

2019 ◽

Vol 29 (3) ◽

pp. 837-853 ◽

Cited By ~ 1

Author(s):

Miguel de Carvalho ◽

Bradley J Barney ◽

Garritt L Page

Keyword(s):

Prostate Cancer ◽

Cancer Diagnosis ◽

Pearson Correlation ◽

Discrimination Performance ◽

Bayes Estimators ◽

Summary Index ◽

Measures Of Agreement ◽

Accuracy Measures ◽

Summary Measures ◽

Using Data

We propose new summary measures of biomarker accuracy which can be used as companions to existing diagnostic accuracy measures. Conceptually, our summary measures are tantamount to the so-called Hellinger affinity and we show that they can be regarded as measures of agreement constructed from similar geometrical principles as Pearson correlation. We develop a covariate-specific version of our summary index, which practitioners can use to assess the discrimination performance of a biomarker, conditionally on the value of a predictor. We devise nonparametric Bayes estimators for the proposed indexes, derive theoretical properties of the corresponding priors, and assess the performance of our methods through a simulation study. The proposed methods are illustrated using data from a prostate cancer diagnosis study.

Download Full-text

Measures of Agreement Versus Measures of Prediction Accuracy

ETS Research Report Series ◽

10.1002/ets2.12258 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 1-23

Author(s):

Shelby J. Haberman

Keyword(s):

Prediction Accuracy ◽

Measures Of Agreement

Download Full-text

Measures of Agreement: Reliability, Classification Accuracy, and Classification Consistency

Handbook of Diagnostic Classification Models - Methodology of Educational Measurement and Assessment ◽

10.1007/978-3-030-05584-4_17 ◽

2019 ◽

pp. 359-377

Author(s):

Sandip Sinharay ◽

Matthew S. Johnson

Keyword(s):

Classification Accuracy ◽

Classification Consistency ◽

Measures Of Agreement

Download Full-text

Measures of Agreement to Assess Attribute-Level Classification Accuracy and Consistency for Cognitive Diagnostic Assessments

Journal of Educational Measurement ◽

10.1111/jedm.12196 ◽

2018 ◽

Vol 55 (4) ◽

pp. 635-664 ◽

Cited By ~ 5

Author(s):

Matthew S. Johnson ◽

Sandip Sinharay

Keyword(s):

Classification Accuracy ◽

Attribute Level ◽

Measures Of Agreement

Download Full-text

measures of agreement
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Clarifying Agreement Calculations and Analysis for End-User Elicitation Studies

Quantification of stroke volume in a simulated healthy volunteer model of traumatic haemorrhage; a comparison of two non-invasive monitoring devices using error grid analysis alongside traditional measures of agreement

First and second morning spot urine protein measurements for the assessment of proteinuria: a diagnostic accuracy study in kidney transplant recipients

First and Second Morning Spot Urine Protein Measurements for the Assessment of Proteinuria: A Diagnostic Accuracy Study in Kidney Transplant Recipients

Abilities of Canine Shelter Behavioral Evaluations and Owner Surrender Profiles to Predict Resource Guarding in Adoptive Homes

Detection of Opinion Communities with the Help of Chance-Corrected Measures of Agreement

Affinity-based measures of biomarker performance evaluation

Measures of Agreement Versus Measures of Prediction Accuracy

Measures of Agreement: Reliability, Classification Accuracy, and Classification Consistency

Measures of Agreement to Assess Attribute-Level Classification Accuracy and Consistency for Cognitive Diagnostic Assessments

Export Citation Format

measures of agreementRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Clarifying Agreement Calculations and Analysis for End-User Elicitation Studies

Quantification of stroke volume in a simulated healthy volunteer model of traumatic haemorrhage; a comparison of two non-invasive monitoring devices using error grid analysis alongside traditional measures of agreement

First and second morning spot urine protein measurements for the assessment of proteinuria: a diagnostic accuracy study in kidney transplant recipients

First and Second Morning Spot Urine Protein Measurements for the Assessment of Proteinuria: A Diagnostic Accuracy Study in Kidney Transplant Recipients

Abilities of Canine Shelter Behavioral Evaluations and Owner Surrender Profiles to Predict Resource Guarding in Adoptive Homes

Detection of Opinion Communities with the Help of Chance-Corrected Measures of Agreement

Affinity-based measures of biomarker performance evaluation

Measures of Agreement Versus Measures of Prediction Accuracy

Measures of Agreement: Reliability, Classification Accuracy, and Classification Consistency

Measures of Agreement to Assess Attribute-Level Classification Accuracy and Consistency for Cognitive Diagnostic Assessments

measures of agreement
Recently Published Documents