Retrospective study highlights AI's potential in early breast cancer detection, identifying cancers missed by human readers

Company: Lunit Product: Lunit INSIGHT MMG

Accuracy of an Artificial Intelligence System for Interval Breast Cancer Detection at Screening Mammography

Radiology, 2024

Abstract

Background

Artificial intelligence (AI) systems can be used to identify interval breast cancers, although the localizations are not always accurate.

Purpose

To evaluate AI localizations of interval cancers (ICs) on screening mammograms by IC category and histopathologic characteristics.

Materials and Methods

A screening mammography data set (median patient age, 57 years [IQR, 52–64 years]) that had been assessed by two human readers from January 2011 to December 2018 was retrospectively analyzed using a commercial AI system. The AI outputs were lesion locations (heatmaps) and the highest per-lesion risk score (range, 0–100) assigned to each case. AI heatmaps were considered false positive (FP) if they occurred on normal screening mammograms or on IC screening mammograms (ie, in patients subsequently diagnosed with IC) but outside the cancer boundary. A panel of consultant radiology experts classified ICs as normal or benign (true negative [TN]), uncertain (minimal signs of malignancy [MS]), or suspicious (false negative [FN]). Several specificity and sensitivity thresholds were applied. Mann-Whitney U tests, Kruskal-Wallis tests, and χ2 tests were used to compare groups.

Results

A total of 2052 screening mammograms (514 ICs and 1548 normal mammograms) were included. The median AI risk score was 50 (IQR, 32–82) for TN ICs, 76 (IQR, 41–90) for ICs with MS, and 89 (IQR, 81–95) for FN ICs (P = .005). Higher median AI scores were observed for invasive tumors (62 [IQR, 39–88]) than for noninvasive tumors (33 [IQR, 20–55]; P < .01) and for high-grade (grade 2–3) tumors (62 [IQR, 40–87]) than for low-grade (grade 0–1) tumors (45 [IQR, 26–81]; P = .02). At the 96% specificity threshold, the AI algorithm flagged 121 of 514 (23.5%) ICs and correctly localized the IC in 93 of 121 (76.9%) cases, with 48 FP heatmaps on the mammograms for ICs (rate, 0.093 per case) and 74 FP heatmaps on normal mammograms (rate, 0.048 per case). The AI algorithm correctly localized a lower proportion of TN ICs (54 of 427; 12.6%) than ICs with MS (35 of 76; 46%) and FN ICs (four of eight; 50% [95% CI: 13, 88]; P < .001). The AI algorithm localized a higher proportion of node-positive than node-negative cancers (P = .03). However, no evidence of a difference by cancer type (P = .09), grade (P = .27), or hormone receptor status (P = .12) was found. At 89.8% specificity and 79% sensitivity thresholds, AI detection increased to 181 (35.2%) and 256 (49.8%) of the 514 ICs, respectively, with FP heatmaps on 158 (10.2%) and 307 (19.8%) of the 1548 normal mammograms.

Conclusion

Use of a standalone AI system improved early cancer detection by correctly identifying some cancers missed by two human readers, with no differences based on histopathologic features except for node-positive cancers.

Read full study

Return