A video anchored rating scale leads to high inter-rater reliability of inexperienced and expert raters in the absence of rater training

ObjectiveThe Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) is widely applied to assess disease severity and progression in patients with motor neuron disease (MND). The objective of the study is to assess the inter-rater and intra-rater reproducibility, i.e., the inter-rater and intra-rater reliability and agreement, of a self-administration version of the ALSFRS-R for use in apps, online platforms, clinical care and trials.MethodsThe self-administration version of the ALSFRS-R was developed based on both patient and expert feedback. To assess the inter-rater reproducibility, 59 patients with MND filled out the ALSFRS-R online and were subsequently assessed on the ALSFRS-R by three raters. To assess the intra-rater reproducibility, patients were invited on two occasions to complete the ALSFRS-R online. Reliability was assessed with intraclass correlation coefficients, agreement was assessed with Bland-Altman plots and paired samples t-tests, and internal consistency was examined with Cronbach’s coefficient alpha.ResultsThe self-administration version of the ALSFRS-R demonstrated excellent inter-rater and intra-rater reliability. The assessment of inter-rater agreement demonstrated small systematic differences between patients and raters and acceptable limits of agreement. The assessment of intra-rater agreement demonstrated no systematic changes between time points; limits of agreement were 4.3 points for the total score and ranged from 1.6 to 2.4 points for the domain scores. Coefficient alpha values were acceptable.DiscussionThe self-administration version of the ALSFRS-R demonstrates high reproducibility and can be used in apps and online portals for both individual comparisons, facilitating the management of clinical care and group comparisons in clinical trials.

Download Full-text

Visual assessment of movement quality in the single leg squat test: a review and meta-analysis of inter-rater and intrarater reliability

BMJ Open Sport & Exercise Medicine ◽

10.1136/bmjsem-2019-000541 ◽

2019 ◽

Vol 5 (1) ◽

pp. e000541 ◽

Cited By ~ 3

Author(s):

John Ressman ◽

Wilhelmus Johannes Andreas Grooten ◽

Eva Rasmussen Barr

Keyword(s):

Rating Scales ◽

Rating Scale ◽

Meta Analysis ◽

Intraclass Correlation ◽

Cochrane Library ◽

Intrarater Reliability ◽

Rater Reliability ◽

Movement Quality ◽

Step Down ◽

Single Leg Squat

Single leg squat (SLS) is a common tool used in clinical examination to set and evaluate rehabilitation goals, but also to assess lower extremity function in active people.ObjectivesTo conduct a review and meta-analysis on the inter-rater and intrarater reliability of the SLS, including the lateral step-down (LSD) and forward step-down (FSD) tests.DesignReview with meta-analysis.Data sourcesCINAHL, Cochrane Library, Embase, Medline (OVID) and Web of Science was searched up until December 2018.Eligibility criteriaStudies were eligible for inclusion if they were methodological studies which assessed the inter-rater and/or intrarater reliability of the SLS, FSD and LSD through observation of movement quality.ResultsThirty-one studies were included. The reliability varied largely between studies (inter-rater: kappa/intraclass correlation coefficients (ICC) = 0.00–0.95; intrarater: kappa/ICC = 0.13–1.00), but most of the studies reached ‘moderate’ measures of agreement. The pooled results of ICC/kappa showed a ‘moderate’ agreement for inter-rater reliability, 0.58 (95% CI 0.50 to 0.65), and a ‘substantial’ agreement for intrarater reliability, 0.68 (95% CI 0.60 to 0.74). Subgroup analyses showed a higher pooled agreement for inter-rater reliability of ≤3-point rating scales while no difference was found for different numbers of segmental assessments.ConclusionOur findings indicate that the SLS test including the FSD and LSD tests can be suitable for clinical use regardless of number of observed segments and particularly with a ≤3-point rating scale. Since most of the included studies were affected with some form of methodological bias, our findings must be interpreted with caution.PROSPERO registration numberCRD42018077822.

Download Full-text

Another Look at air Traffic Controller Performance Evaluation

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/154193129604001204 ◽

1996 ◽

Vol 40 (12) ◽

pp. 574-578

Author(s):

Earl S. Stein ◽

Randy L. Sollenberger

Keyword(s):

Rating Scales ◽

Rating Scale ◽

Evaluation Criteria ◽

Air Traffic ◽

Group Discussions ◽

Air Traffic Controller ◽

Rater Reliability ◽

Controller Performance ◽

Performance Area ◽

Mutual Evaluation

This paper describes a study that evaluated the reliability of a recently developed rating form designed to assess air traffic controller performance. Six supervisors from different radar approach control facilities nationwide viewed 20 video tapes of controllers working traffic from a previously recorded simulation study. The observer/raters used a new evaluation form that consisted of 24 different rating scales measuring specific areas of controller performance. An important part of this study was observer training. The training consisted of practice rating sessions followed by group discussions. In discussion, observers established mutual evaluation criteria for each performance area. Inter-rater reliability was assessed using intraclass correlations, and intra-rater reliability was assessed using Pearson product-moment correlations on repeated video tapes. In general, the reliability of the form was quite good, however, a few rating scales were much less reliable than the others. Reasons for the differences in rating scale reliability are discussed.

Download Full-text

Distress during airway sampling in children with cystic fibrosis

Archives of Disease in Childhood ◽

10.1136/archdischild-2017-314241 ◽

2018 ◽

Vol 104 (8) ◽

pp. 806-808 ◽

Cited By ~ 1

Author(s):

Jun Ting Chau ◽

Karen Peebles ◽

Yvonne Belessis ◽

Adam Jaffe ◽

Michael Doumit

Keyword(s):

Cystic Fibrosis ◽

Heart Rate ◽

Diagnostic Accuracy ◽

Rating Scale ◽

Parent Rating ◽

Rater Reliability ◽

Oropharyngeal Swab

BackgroundOropharyngeal suction and oropharyngeal swab are two methods of obtaining airway samples with similar diagnostic accuracy in children with cystic fibrosis (CF). The primary aim was comparing distress between suctioning and swabbing. A secondary aim was establishing the reliability of the Groningen Distress Rating Scale (GDRS).MethodsRandomised oropharyngeal suction or swab occurred over two visits. Two physiotherapists and the child’s parent rated distress using the GDRS. Heart rate (HR) was also measured.Results24 children with CF, mean age of 3 years, participated. Both physiotherapist and parent rating showed significantly higher distress levels during suction than swab. Inter-rater reliability for the GDRS was very good between physiotherapists, and good between physiotherapist and parents.ConclusionThe study found that oropharyngeal swab is less distressing in obtaining samples than oropharyngeal suction and that the GDRS was reliable and valid.

Download Full-text

Standardized Rater Training for the Hamilton Depression Rating Scale (HAMD17) and the Inventory of Depressive Symptoms (IDSC30)

Psychopathology ◽

10.1159/000318162 ◽

2011 ◽

Vol 44 (1) ◽

pp. 68-70 ◽

Cited By ~ 17

Author(s):

Stefanie Wagner ◽

Isabella Helmreich ◽

Klaus Lieb ◽

André Tadić

Keyword(s):

Depressive Symptoms ◽

Rating Scale ◽

Rater Training ◽

Hamilton Depression Rating Scale

Download Full-text

The challenge of training supervisors to use direct assessments of clinical competence in CBT consistently: a systematic review and exploratory training study

The Cognitive Behaviour Therapist ◽

10.1017/s1754470x15000288 ◽

2016 ◽

Vol 9 ◽

Cited By ~ 5

Author(s):

Maria E. Loades ◽

Peter Armstrong

Keyword(s):

Systematic Review ◽

Clinical Competence ◽

Exploratory Study ◽

Research Priorities ◽

Future Research ◽

Rater Training ◽

Supervisor Training ◽

Rater Reliability ◽

Exploratory Data ◽

And Training

AbstractEvaluating and enhancing supervisee competence is a key function of supervision and can be aided by the use of direct assessments of clinical competence, e.g. the Cognitive Therapy Scale – Revised (CTS-R). We aimed to review the literature regarding inter-rater reliability and training on the CTS and CTS-R to present exploratory data on training raters to use this measure. We employed a systematic review. An exploratory study evaluated the outcomes of a CTS-R supervisor training workshop (n = 34), including self-reported familiarity with and confidence in using the tool, and inter-rater consistency on three CTS-R subscales, pre- and post-training. CTS and CTS-R inter-rater reliability was variable, with evidence of rater training enhancing reliability, although the form, duration and frequency of such training is unclear. The exploratory study found that supervisors rated themselves as more familiar with and confident in using the CTS-R at the end of training compared to at the beginning. However, inter-rater reliability was poor at the beginning and end of the training. Rating competence requires supervisors to make qualitative judgements, which is inherently variable. Training raters has been shown to improve rater reliability, although this was not demonstrated in the exploratory study. Practice implications and future research priorities are identified.

Download Full-text

Inter-rater Reliability of the Crichton Geriatric Behavioural Rating Scale

Age and Ageing ◽

10.1093/ageing/18.1.57 ◽

1989 ◽

Vol 18 (1) ◽

pp. 57-60 ◽

Cited By ~ 12

Author(s):

MARTIN G. COLE

Keyword(s):

Rating Scale ◽

Rater Reliability

Download Full-text