Formative Evaluation of Consumer-Grade Activity Monitors Worn by Older Adults: Test-Retest Reliability and Criterion Validity of Step Counts (Preprint)
BACKGROUND To assess whether commercial-grade activity monitors are appropriate for measuring step counts in older adults, it is essential to evaluate their measurement properties in this population. OBJECTIVE This study aimed to evaluate test-retest reliability and criterion validity of step counting in older adults with self-reported intact and limited mobility from 6 commercial-grade activity monitors: Fitbit Charge, Fitbit One, Garmin vívofit 2, Jawbone UP2, Misfit Shine, and New-Lifestyles NL-1000. METHODS For test-retest reliability, participants completed two 100-step overground walks at a usual pace while wearing all monitors. We tested the effects of the activity monitor and mobility status on the absolute difference in step count error (%) and computed the standard error of measurement (SEM) between repeat trials. To assess criterion validity, participants completed two 400-meter overground walks at a usual pace while wearing all monitors. The first walk was continuous; the second walk incorporated interruptions to mimic the conditions of daily walking. Criterion step counts were from the researcher tally count. We estimated the effects of the activity monitor, mobility status, and walk interruptions on step count error (%). We also generated Bland-Altman plots and conducted equivalence tests. RESULTS A total of 36 individuals participated (n=20 intact mobility and n=16 limited mobility; 19/36, 53% female) with a mean age of 71.4 (SD 4.7) years and BMI of 29.4 (SD 5.9) kg/m<sup>2</sup>. Considering test-retest reliability, there was an effect of the activity monitor (<i>P</i><.001). The Fitbit One (1.0%, 95% CI 0.6% to 1.3%), the New-Lifestyles NL-1000 (2.6%, 95% CI 1.3% to 3.9%), and the Garmin vívofit 2 (6.0%, 95 CI 3.2% to 8.8%) had the smallest mean absolute differences in step count errors. The SEM values ranged from 1.0% (Fitbit One) to 23.5% (Jawbone UP2). Regarding criterion validity, all monitors undercounted the steps. Step count error was affected by the activity monitor (<i>P</i><.001) and walk interruptions (<i>P</i>=.02). Three monitors had small mean step count errors: Misfit Shine (−1.3%, 95% CI −19.5% to 16.8%), Fitbit One (−2.1%, 95% CI −6.1% to 2.0%), and New-Lifestyles NL-1000 (−4.3%, 95 CI −18.9% to 10.3%). Mean step count error was larger during interrupted walking than continuous walking (−5.5% vs −3.6%; <i>P</i>=.02). Bland-Altman plots illustrated nonsystematic bias and small limits of agreement for Fitbit One and Jawbone UP2. Mean step count error lay within an equivalence bound of ±5% for Fitbit One (<i>P</i><.001) and Misfit Shine (<i>P</i>=.001). CONCLUSIONS Test-retest reliability and criterion validity of step counting varied across 6 consumer-grade activity monitors worn by older adults with self-reported intact and limited mobility. Walk interruptions increased the step count error for all monitors, whereas mobility status did not affect the step count error. The hip-worn Fitbit One was the only monitor with high test-retest reliability and criterion validity.