Reliability of a computer-aided system in the evaluation of indeterminate ultrasound images of thyroid nodules
Introduction Computer-aided diagnostic (CAD) programs for malignancy risk stratification from ultrasound (US) imaging of thyroid nodules are being validated both experimentally and in real-world practice. However, they have not been tested for reliability in analyzing difficult or unclear images. Methods US images with indeterminate characteristics were evaluated by five observers with different experience in US examination and by a commercial CAD program. The nodules, on which the observers widely agreed, were considered concordant and, if there was little agreement, not concordant or difficult to assess. The diagnostic performance of the readers and the CAD program was calculated and compared in both groups of nodule images. Results In the group of concordant thyroid nodules (n = 37), the clinicians and the CAD system obtained similar levels of accuracy (77.0% vs 74.2%, respectively; P = 0.7) and no differences were found in sensitivity (SEN) (95.0% vs 87.5%, P = 0.2), specificity (SPE) (45.5 vs 49.4, respectively; P = 0.7), positive predictive value (PPV) (75.2% vs 77.7%, respectively; P = 0.8), nor negative predictive value (NPV) (85.6 vs 77.7, respectively; P = 0.3). When analyzing the non-concordant nodules (n = 43), the CAD system presented a decrease in accuracy of 4.2%, which was significantly lower than that observed by the experts (19.9%, P = 0.02). Conclusions Clinical observers are similar to the CAD system in the US assessment of the risk of thyroid nodules. However, the AI system for thyroid nodules AmCAD-UT® showed more reliability in the analysis of unclear or misleading images.