Comparing two artificial intelligence software packages for normative brain volumetry in memory clinic imaging
Abstract Purpose To compare two artificial intelligence software packages performing normative brain volumetry and explore whether they could differently impact dementia diagnostics in a clinical context. Methods Sixty patients (20 Alzheimer’s disease, 20 frontotemporal dementia, 20 mild cognitive impairment) and 20 controls were included retrospectively. One MRI per subject was processed by software packages from two proprietary manufacturers, producing two quantitative reports per subject. Two neuroradiologists assigned forced-choice diagnoses using only the normative volumetry data in these reports. They classified the volumetric profile as “normal,” or “abnormal”, and if “abnormal,” they specified the most likely dementia subtype. Differences between the packages’ clinical impact were assessed by comparing (1) agreement between diagnoses based on software output; (2) diagnostic accuracy, sensitivity, and specificity; and (3) diagnostic confidence. Quantitative outputs were also compared to provide context to any diagnostic differences. Results Diagnostic agreement between packages was moderate, for distinguishing normal and abnormal volumetry (K = .41–.43) and for specific diagnoses (K = .36–.38). However, each package yielded high inter-observer agreement when distinguishing normal and abnormal profiles (K = .73–.82). Accuracy, sensitivity, and specificity were not different between packages. Diagnostic confidence was different between packages for one rater. Whole brain intracranial volume output differed between software packages (10.73%, p < .001), and normative regional data interpreted for diagnosis correlated weakly to moderately (rs = .12–.80). Conclusion Different artificial intelligence software packages for quantitative normative assessment of brain MRI can produce distinct effects at the level of clinical interpretation. Clinics should not assume that different packages are interchangeable, thus recommending internal evaluation of packages before adoption.