Off-label use of AI for breast cancer detection on synthetic mammography leads to compromised performance

Company: Lunit Product: Lunit INSIGHT MMG

Performance of Digital Mammography-Based Artificial Intelligence Computer-Aided Diagnosis on Synthetic Mammography From Digital Breast Tomosynthesis

Korean Journal of Radiology, 2025

Abstract

Objective

To test the performance of an artificial intelligence-based computer-aided diagnosis (AI-CAD) designed for full-field digital mammography (FFDM) when applied to synthetic mammography (SM).

Materials and methods

We analyzed 501 women (mean age, 57 ± 11 years) who underwent preoperative mammography and breast cancer surgery. This cohort consisted of 1002 breasts, comprising 517 with cancer and 485 without. All patients underwent digital breast tomosynthesis (DBT) and FFDM during the preoperative workup. The SM is routinely reconstructed using DBT. Commercial AI-CAD (Lunit Insight MMG, version 1.1.7.2) was retrospectively applied to SM and FFDM to calculate the abnormality scores for each breast. The median abnormality scores were compared for the 517 breasts with cancer using the Wilcoxon signed-rank test. Calibration curves of abnormality scores were evaluated. The discrimination performance was analyzed using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity using a 10% preset threshold. Sensitivity and specificity were further analyzed according to the mammographic and pathological characteristics. The results of SM and FFDM were compared.

Results

AI-CAD demonstrated a significantly lower median abnormality score (71% vs. 96%, P < 0.001) and poorer calibration performance for SM than for FFDM. SM exhibited lower sensitivity (76.2% vs. 82.8%, P < 0.001), higher specificity (95.5% vs. 91.8%, P < 0.001), and comparable AUC (0.86 vs. 0.87, P = 0.127) than FFDM. SM showed lower sensitivity than FFDM in asymptomatic breasts, dense breasts, ductal carcinoma in situ, T1, N0, and hormone receptor-positive/human epidermal growth factor receptor 2-negative cancers but showed higher specificity in non-cancerous dense breasts.

Conclusion

AI-CAD showed lower abnormality scores and reduced calibration performance for SM than for FFDM. Furthermore, the 10% preset threshold resulted in different discrimination performances for the SM. Given these limitations, off-label application of the current AI-CAD to SM should be avoided.

Read full study

Return