AI in mammography: high accuracy yet outperformed by human double-reading

screenpoint-transpara.png

Companies: ScreenPoint Medical, iCAD Products: Transpara, Profound AI


AI-enhanced Mammography With Digital Breast Tomosynthesis for Breast Cancer Detection: Clinical Value and Comparison With Human Performance

Radiology: Imaging Cancer, 2024

Abstract

Two artificial intelligence systems for mammography with digital breast tomosynthesis demonstrated high performance in detecting malignancies, although performance was lower when compared against human double-reading.

Purpose

To compare two deep learning–based commercially available artificial intelligence (AI) systems for mammography with digital breast tomosynthesis (DBT) and benchmark them against the performance of radiologists.

Materials and Methods

This retrospective study included consecutive asymptomatic patients who underwent mammography with DBT (2019–2020). Two AI systems (Transpara 1.7.0 and ProFound AI 3.0) were used to evaluate the DBT examinations. The systems were compared using receiver operating characteristic (ROC) analysis to calculate the area under the ROC curve (AUC) for detecting malignancy overall and within subgroups based on mammographic breast density. Breast Imaging Reporting and Data System results obtained from standard-of-care human double-reading were compared against AI results with use of the DeLong test.

Results

Of 419 female patients (median age, 60 years [IQR, 52–70 years]) included, 58 had histologically proven breast cancer. The AUC was 0.86 (95% CI: 0.85, 0.91), 0.93 (95% CI: 0.90, 0.95), and 0.98 (95% CI: 0.96, 0.99) for Transpara, ProFound AI, and human double-reading, respectively. For Transpara, a rule-out criterion of score 7 or lower yielded 100% (95% CI: 94.2, 100.0) sensitivity and 60.9% (95% CI: 55.7, 66.0) specificity. The rule-in criterion of higher than score 9 yielded 96.6% sensitivity (95% CI: 88.1, 99.6) and 78.1% specificity (95% CI: 73.8, 82.5). For ProFound AI, a rule-out criterion of lower than score 51 yielded 100% sensitivity (95% CI: 93.8, 100) and 67.0% specificity (95% CI: 62.2, 72.1). The rule-in criterion of higher than score 69 yielded 93.1% (95% CI: 83.3, 98.1) sensitivity and 82.0% (95% CI: 77.9, 86.1) specificity.

Conclusion

Both AI systems showed high performance in breast cancer detection but lower performance compared with human double-reading.

Read full study