Intra-rater reliability and repeatability of pulse oximetry values obtained before, between sets, and after resistive exercise

Real Difference ◽

Rater Reliability ◽

Resistive Exercise ◽

Cellular Oxygen

BACKGROUND: Pulse oximetry measures heart rate (HR) and percent oxygen saturation (SpO2). For aerobic exercise, whereby cellular oxygen demand and delivery are elevated and maintained for extended periods, HR and SpO2 values are consistent when measured by pulse oximetry. Yet due to its intermittent nature, HR and SpO2 values from resistive exercise may exhibit lower data reliability and repeatability. OBJECTIVE: Assess intra-rater reliability and repeatability of pulse oximetry HR and SpO2 values from two identical resistive exercise protocols. METHODS: Subjects (n= 32) performed two calf press workouts on a flywheel-based ergometer as HR and SpO2 were measured before, between sets, and after exercise. Workouts entailed a 4-set 15-repetition protocol separated by 120-second rests. Intra-rater reliability was assessed with intraclass correlation coefficients (ICC). Repeatability was measured by the smallest real difference in absolute and relative terms. RESULTS: ICC and standard error of estimate results for HR ranged from 0.60–0.79 and 9.1–13.0 respectively. SpO2 ICC and standard error of estimate results ranged from 0.16–0.71 and 1.44–4.33 respectively. Between sets, smallest real difference values tended to be less for HR. CONCLUSIONS: Results demonstrate acceptable intra-rater reliability and repeatability for HR, but not SpO2 which we attribute to the exercise mode and protocol examined.

Is the location of the signal intensity weighted centroid a reliable measurement of fluid displacement within the disc?

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2016-0178 ◽

2018 ◽

Vol 63 (4) ◽

pp. 453-460 ◽

Cited By ~ 7

Author(s):

Vahid Abdollah ◽

Eric C. Parent ◽

Michele C. Battié

Keyword(s):

Signal Intensity ◽

Water Distribution ◽

Intraclass Correlation ◽

Region Of Interest ◽

Rater Reliability ◽

Fluid Displacement ◽

The Mean ◽

Standard Error Of Measurement

Abstract Degenerated discs have shorter T2-relaxation time and lower MR signal. The location of the signal-intensity-weighted-centroid reflects the water distribution within a region-of-interest (ROI). This study compared the reliability of the location of the signal-intensity-weighted-centroid to mean signal intensity and area measurements. L4-L5 and L5-S1 discs were measured on 43 mid-sagittal T2-weighted 3T MRI images in adults with back pain. One rater analysed images twice and another once, blinded to measurements. Discs were semi-automatically segmented into a whole disc, nucleus, anterior and posterior annulus. The coordinates of the signal-intensity-weighted-centroid for all regions demonstrated excellent intraclass-correlation-coefficients for intra- (0.99–1.00) and inter-rater reliability (0.97–1.00). The standard error of measurement for the Y-coordinates of the signal-intensity-weighted-centroid for all ROIs were 0 at both levels and 0 to 2.7 mm for X-coordinates. The mean signal intensity and area for the whole disc and nucleus presented excellent intra-rater reliability with intraclass-correlation-coefficients from 0.93 to 1.00, and 0.92 to 1.00 for inter-rater reliability. The mean signal intensity and area had lower reliability for annulus ROIs, with intra-rater intraclass-correlation-coefficient from 0.5 to 0.76 and inter-rater from 0.33 to 0.58. The location of the signal-intensity-weighted-centroid is a reliable biomarker for investigating the effects of disc interventions.

Test–Retest Reliability of a New Device Versus a Long-Arm Goniometer to Evaluate Knee Proprioception

Journal of Sport Rehabilitation ◽

10.1123/jsr.2021-0146 ◽

2021 ◽

pp. 1-6

Author(s):

Fei Tian ◽

Yaqi Zhao ◽

Jixin Li ◽

Wenjin Wang ◽

Danni Wu ◽

...

Keyword(s):

Standard Error ◽

Intraclass Correlation ◽

Joint Position Sense ◽

Repeated Measurements ◽

Good Reliability ◽

Retest Reliability ◽

New Device ◽

Test Retest Reliability

Context: Many methods used to evaluate knee proprioception have shortcomings that limit their use in clinical settings. Based on an inexpensive 3D camera, a new portable device was recently used to evaluate the joint position sense (JPS) of the knee joint. However, the test–retest reliability of the new method remains unclear. This study aimed to evaluate the test–retest reliability of the new device and a long-arm goniometer for assessing knee JPS, and to compare the variability of the 2 methods. Design: Prospective observational study of the test–retest reliability of knee JPS measurements. Methods: Twenty-one healthy adults were tested in 2 sessions with a 1-week interval. Three target knee flexion angles (30°, 45°, and 60°) were reproduced in each session. Target and reproduced angles were measured with both methods. Intraclass correlation coefficients, standard error of the measurement, and Bland–Altman plots were used to quantify test–retest reliability. Paired t tests were used to compare knee JPS (absolute error of the target-reproduced angle) between the methods. Results: The new device (good to excellent intraclass correlation coefficients .74–.80; standard error of the measurement 0.52°–0.61°) demonstrated better test–retest reliability than the goniometer (poor to fair intraclass correlation coefficients .23–.43; standard error of the measurement 0.89°–2.07°) and better test–retest agreement (respective mean differences for the 30°, 45°, and 60° knee angles: 0.11°, 0.13°, and 0.41° for the new system; 0.84°, 1.52°, and 1.18° for the goniometer). The measurements (absolute errors of the target-reproduced angles) with the goniometer were significantly greater than those with the new device (P < .05); the SDs of repeated measurements with the goniometer (1.50°–2.41°) were greater than with the new device (1.08°–1.38°). Conclusions: Given that the new device has good reliability and sufficient precision, it is the better alternative for evaluating knee JPS. Goniometers should be used with caution to assess knee JPS.

Reliability and validity of the iSense optical scanner for measuring volume of transtibial residual limb models

Prosthetics and Orthotics International ◽

10.1177/0309364618806038 ◽

2018 ◽

Vol 43 (2) ◽

pp. 213-220 ◽

Cited By ~ 1

Author(s):

Lucy Armitage ◽

Li Khim Kwah ◽

Lauren Kark

Keyword(s):

Intraclass Correlation ◽

Reliability And Validity ◽

Criterion Validity ◽

Residual Limb ◽

Limb Volume ◽

Rater Reliability ◽

Measuring Volume ◽

Optical Scanner

Background: Residual limb volume is often measured as part of routine care for people with amputations. These measurements assist in the timing of prosthetic fitting or replacement. In order to make well informed decisions, clinicians need access to measurement tools that are valid and reliable. Objectives: To assess the reliability and criterion validity of the iSense optical scanner in measuring volume of transtibial residual limb models. Study Design: Three assessors performed two measurements each on 13 residual limb models with an iSense optical scanner (3D systems, USA). Intra-rater and inter-rater reliability were calculated using intraclass correlation coefficients. Bland Altman plots were inspected for agreement. Criterion validity was assessed using a steel rod of known dimensions. Ten repeated measurements were performed by one assessor. A t-test was used to determine differences between measured and true rod volume. Results: Intra-rater reliability was excellent (range of intraclass correlation coefficients: 0.991–0.997, all with narrow 95% confidence intervals). While the intraclass correlation coefficients suggest excellent inter-rater reliability between all three assessors (range of intraclass correlation coefficients: 0.952–0.986), the 95% confidence intervals were wide between assessor 3 and the other two assessors. Poor agreement with assessor 3 was also seen in the Bland-Altman plots. Criterion validity was very poor with a significant difference between the mean iSense measurement and the true rod volume (difference: 221.18 mL; p < 0.001). Conclusions: Although intra-rater reliability was excellent for the iSense scanner, we did not find similar results for inter-rater reliability and validity. These results suggest that further testing of the iSense scanner is required prior to use in clinical practice. Clinical relevance The iSense offers a low cost scanning option for residual limb volume measurement. Intra-rater reliability was excellent, but inter-rater reliability and validity were such that clinical adoption is not indicated at present.

Inter-Rater Reliability: Intraclass Correlation Coefficients

Educational and Psychological Measurement ◽

10.1177/001316448104100127 ◽

1981 ◽

Vol 41 (1) ◽

pp. 223-226 ◽

Cited By ~ 6

Author(s):

Dong Won Cho

Keyword(s):

Intraclass Correlation ◽

Intraclass Correlation Coefficients

Rater Reliability ◽

Reliability of Autism-Tics, AD/HD, and other Comorbidities (A–TAC) Inventory in a Test-Retest Design

Psychological Reports ◽

10.2466/03.15.pr0.114k10w1 ◽

2014 ◽

Vol 114 (1) ◽

pp. 93-103 ◽

Cited By ~ 15

Author(s):

Tomas Larson ◽

Eva Norén Selinus ◽

Clara Hellner Gumpert ◽

Thomas Nilsson ◽

Nóra Kerekes ◽

...

Keyword(s):

Intraclass Correlation ◽

Population Based ◽

Autism Spectrum ◽

Good Test ◽

Rater Reliability ◽

Retest Reliability ◽

Intraclass Correlations ◽

Test Retest Reliability

The Autism-Tics, AD/HD, and other Comorbidities (A–TAC) inventory is used in epidemiological research to assess neurodevelopmental problems and coexisting conditions. Although the A–TAC has been applied in various populations, data on retest reliability are limited. The objective of the present study was to present additional reliability data. The A–TAC was administered by lay assessors and was completed on two occasions by parents of 400 individual twins, with an average interval of 70 days between test sessions. Intra- and inter-rater reliability were analysed with intraclass correlations and Cohen's κ. A–TAC showed excellent test-retest intraclass correlations for both autism spectrum disorder and attention deficit hyperactivity disorder (each at .84). Most modules in the A–TAC had intra- and inter-rater reliability intraclass correlation coefficients of ≥ .60. Cohen's κ indicated acceptable reliability. The current study provides statistical evidence that the A–TAC yields good test-retest reliability in a population-based cohort of children.

Reliability of assessment of medical students’ non-technical skills using a behavioural marker system: does clinical experience matter?

BMJ Simulation and Technology Enhanced Learning ◽

10.1136/bmjstel-2020-000705 ◽

2020 ◽

pp. bmjstel-2020-000705

Author(s):

Benjamin Clarke ◽

Samantha E Smith ◽

Emma Claire Phillips ◽

Ailsa Hamilton ◽

Joanne Kerins ◽

...

Keyword(s):

Medical Students ◽

Clinical Experience ◽

Intraclass Correlation ◽

Technical Skills ◽

Rater Reliability ◽

Single Measure ◽

Marker System ◽

Reliability Coefficients

IntroductionNon-technical skills are recognised to play an integral part in safe and effective patient care. Medi-StuNTS (Medical Students’ Non-Technical Skills) is a behavioural marker system developed to enable assessment of medical students’ non-technical skills. This study aimed to assess whether newly trained raters with high levels of clinical experience could achieve reliability coefficients of >0.7 and to compare differences in inter-rater reliability of raters with varying clinical experience.MethodsForty-four raters attended a workshop on Medi-StuNTS before independently rating three videos of medical students participating in immersive simulation scenarios. Data were grouped by raters’ levels of clinical experience. Inter-rater reliability was assessed by calculating intraclass correlation coefficients (ICC).ResultsEleven raters with more than 10 years of clinical experience achieved single-measure ICC of 0.37 and average-measures ICC of 0.87. Fourteen raters with more than or equal to 5 years and less than 10 years of clinical experience achieved single-measure ICC of 0.09 and average-measures ICC of 0.59. Nineteen raters with less than 5 years of clinical experience achieved single-measure ICC of 0.09 and average-measures ICC 0.65.ConclusionsUsing 11 newly trained raters with high levels of clinical experience produced highly reliable ratings that surpassed the prespecified inter-rater reliability standard; however, a single rater from this group would not achieve sufficiently reliable ratings. This is consistent with previous studies using other medical behavioural marker systems. This study demonstrated a decrease in inter-rater reliability of raters with lower levels of clinical experience, suggesting caution when using this population as raters for assessment of non-technical skills.

Vein Measurement by Peripherally Inserted Central Catheter Nurses Using Ultrasound: A Reliability Study

Journal of the Association for Vascular Access ◽

10.1016/j.java.2013.08.001 ◽

2013 ◽

Vol 18 (4) ◽

pp. 234-238 ◽

Cited By ~ 8

Author(s):

Rebecca Sharp ◽

Andrea Gordon ◽

Antonina Mikocka-Walus ◽

Jessie Childs ◽

Carol Grech ◽

...

Keyword(s):

Intraclass Correlation ◽

Cephalic Vein ◽

Basilic Vein ◽

Rater Reliability ◽

Vein Thrombosis ◽

Measurement Protocol ◽

Brachial Vein ◽

Deep Vein

Abstract Background: Peripherally inserted central catheters (PICCs) are increasingly inserted by trained registered nurses, necessitating the development of specialized skills such as the use of ultrasound. The selection of an adequately sized vein is an important factor in reducing adverse events such as deep vein thrombosis. However, PICC nurses may receive minimal training in the use of ultrasound for vein measurement. Objective: We aimed to demonstrate the reliability of a vein measurement protocol using ultrasound by a PICC nurse trained in sonography. Methods: The diameter of the basilic, brachial, and cephalic veins in the left arms of healthy participants (n =12) were measured using ultrasound by a PICC nurse and a sonographer. A PICC nurse performed the measurement twice and the sonographer once; the PICC nurse's results were compared for intra-rater reliability and compared with the sonographer for inter-rater reliability. The results were analyzed using intraclass correlation coefficients (ICCs). Results: Inter-rater reliability between the PICC nurse and the sonographer was adequate, the ICC for the brachial vein was 0.60 (95% confidence interval [CI], 0.06–0.87), basilic vein ICC was 0.87 (95% CI, 0.58–0.96) and cephalic vein ICC was 0.77 (95% CI, 0.39–0.93). Intra-rater reliability of the PICC nurse was higher; the ICC for the brachial vein was 0.80 (95% CI, 0.44–0.94), basilic vein ICC was 0.92 (95% CI, 0.67–0.98), and cephalic vein ICC was 0.78 (95% CI, 0.40–0.93). Conclusions: Using a suitable protocol, a PICC nurse was able to measure vein diameter reliably when compared with a sonographer and consistently replicate these results.

Intra and Inter-rater Reliability between Ultrasound Imaging and Caliper Measures to determine Spring Ligament Dimensions in Cadavers

Scientific Reports ◽

10.1038/s41598-019-51384-6 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Fernando Santiago-Nuño ◽

Patricia Palomo-López ◽

Ricardo Becerro-de-Bengoa-Vallejo ◽

César Calvo-Lobo ◽

Marta Elena Losa-Iglesias ◽

...

Keyword(s):

Ultrasound Imaging ◽

Intraclass Correlation ◽

Absolute Accuracy ◽

Strong Correlations ◽

Perfect Agreement ◽

Rater Reliability ◽

Spring Ligament ◽

Good Repeatability

Abstract The purpose was to evaluate intra and inter-rater reliability, repeatability and absolute accuracy between ultrasound imaging (US) and caliper measures to determine Spring ligament (SL) dimensions in cadavers. SLs were identified from 62 human feet from formaldehyde-embalmed cadavers. Intra and inter-observer reliability, repeatability and absolute accuracy of SL width, thickness and length between US and caliper measurements were determined at intra and inter-session by intraclass correlation coefficients, Pearson´s correlation coefficients, Student t tests, standard errors of measurement, minimum detectable changes, values of normality, 95% limits of agreement, and Bland-Altman plots. Excellent inter-session and inter-rater reliability, adequate absolute accuracy, almost perfect agreement and strong correlations were shown for caliper, US and their comparison for all SL dimensions. US measurements presented higher absolute accuracy than caliper measures for SL length and thickness dimensions, while caliper displayed greater absolute accuracy for SL width dimensions. Good repeatability (P > 0.05) was shown for all SL dimensions by US, caliper and their comparison, except for SL width dimension measured with US (P = 0.019). Both US and caliper could be recommended for all SL dimensions evaluation due to their excellent reliability and absolute accuracy in cadavers, although width dimensions should be considered with caution due to US repeatability differences.

Appraisal of a scoring instrument for training and testing neonatal intubation skills

Archives of Disease in Childhood - Fetal and Neonatal Edition ◽

10.1136/archdischild-2018-315221 ◽

2018 ◽

Vol 104 (5) ◽

pp. F521-F527 ◽

Cited By ~ 1

Author(s):

Romy N Bouwmeester ◽

Mathijs Binkhorst ◽

Nicole K Yamada ◽

Rosa Geurtzen ◽

Arno F J van Heijst ◽

...

Keyword(s):

Construct Validity ◽

Intraclass Correlation ◽

Rater Reliability ◽

Patient Simulator ◽

Training Centre ◽

Tube Position ◽

The Usa ◽

Neonatal Patient

ObjectiveTo determine the validity, reliability, feasibility and applicability of a neonatal intubation scoring instrument.DesignProspective observational study.SettingSimulation-based research and training centre (Center for Advanced Pediatric and Perinatal Education), California, USA.SubjectsForty clinicians qualified for neonatal intubation.InterventionsVideotaped elective intubations on a neonatal patient simulator were scored by two independent raters. One rater scored the intubations twice. We scored the preparation of equipment and premedication, intubation performance, tube position/fixation, communication, number of attempts, duration and successfulness of the procedure.Main outcome measuresIntraclass correlation coefficients (ICC) were calculated for intrarater and inter-rater reliability. Kappa coefficients for individual items and mean kappa coefficients for all items combined were calculated. Construct validity was assessed with one-way analysis of variance using the hypothesis that experienced clinicians score higher than less experienced clinicians. The approximate time to score one intubation and the instrument’s applicability in another setting were evaluated.ResultsICCs for intrarater and inter-rater reliability were 0.99 (95% CI 0.98 to 0.99) and 0.89 (95% CI 0.35 to 0.96), and mean kappa coefficients were 0.93 (95% CI 0.85 to 1.01) and 0.71 (95% CI 0.56 to 0.92), respectively. There were no differences between the more and less experienced clinicians regarding preparation, performance, communication and total scores. The experienced group scored higher only on tube position/fixation (p=0.02). Scoring one intubation took approximately 15 min. Our instrument, developed in The Netherlands, could be readily applied in the USA.ConclusionsOur scoring instrument for simulated neonatal intubations appears to be reliable, feasible and applicable in another centre. Construct validity could not be established.

Intra- and inter-rater reliability of the Behaviour Mapping Schedule: A direct observational tool for classifying children’s play behaviour

Australasian Journal of Early Childhood ◽

10.1177/1836939120982764 ◽

2021 ◽

pp. 183693912098276

Author(s):

Kylie A Dankiw ◽

Katherine L Baldock ◽

Saravana Kumar ◽

Margarita D Tsiros

Keyword(s):

Intraclass Correlation ◽

South Australia ◽

Rater Reliability ◽

Children's Play ◽

Observational Tool ◽

Play Behaviour ◽

Children’S Play ◽

Training Resources

Identifying and describing children’s play behaviours is an important component of evaluating child development. The Behaviour Mapping Schedule is a direct observational tool which aims to describe and quantify children’s play behaviours but is yet to undergo reliability testing. This study aimed to determine the intra- and inter-rater reliability of the Behaviour Mapping Schedule. Twelve children aged 3–5 years were each video recorded for one 20-minute playtime period at a purposively selected Community Children’s Centre in Adelaide, South Australia. The video recordings were coded independently by two raters against 23 behaviour codes. Intraclass correlation coefficients (ICCs) were calculated. Intra-rater ICCs for nearly 70% of the behaviour codes were considered ‘excellent’; likewise, for inter-rater ICCs on more than 50% of the behaviour codes. Overall, the Behaviour Mapping Schedule is a reliable tool for observing children’s play behaviour; however, additional training resources may be useful to further strengthen inter-rater reliability.