A Danish retrospective study evaluated Lunit INSIGHT MMG (Lunit), an AI tool for breast cancer detection, in replacing human readers in mammography screening. The study analyzed 249,402 mammograms from 149,495 women, comparing three AI-integrated screening scenarios to standard double reading by radiologists. The objective was to assess how AI could reduce workload while maintaining diagnostic accuracy.
In Scenario 1 AI replaced the first radiologist. The second human reader reviewed the AI’s findings, with arbitration used in cases of disagreement. This setup maintained cancer detection accuracy, with no significant changes in sensitivity or specificity. However, arbitration rates increased slightly by 0.99% (P < .001), indicating a marginal increase in workload for dispute resolution. Workload was reduced by 48.8%.
In Scenario 2 the first radiologist performed the initial review, and AI acted as the second reader. This approach reduced unnecessary recalls and improved the positive predictive value (PPV) by 0.03% (P < .001). However, sensitivity decreased by 1.53% (P < .001), meaning a few cancer cases might be missed compared to double reading by two radiologists. Arbitration rates increased by 1.22% (P < .001), and workload was reduced by 48.7%.
In Scenario 3 AI fully managed low- and high-risk cases, leaving only moderate-risk cases for human review. This scenario achieved the best results, increasing sensitivity by 1.33% (P < .001) and PPV by 0.36% (P = .03) while reducing arbitration rates by 0.89% (P < .001). It also reduced the workload for human radiologists by 49.7%, making it the most efficient and accurate approach tested.
Overall, the study concluded that integrating AI into mammography screening workflows could effectively reduce workload while maintaining or improving cancer detection accuracy, depending on how AI is implemented. Scenario 3, where AI triages cases, showed the most promise for practical deployment.
Read full study
AI-integrated Screening to Replace Double Reading of Mammograms: A Population-wide Accuracy and Feasibility Study
Radiology: Artificial Intelligence, 2024
Abstract
Mammography screening supported by deep learning-based artificial intelligence (AI) solutions can potentially reduce workload without compromising breast cancer detection accuracy, but the site of deployment in the workflow might be crucial. This retrospective study compared three simulated AI-integrated screening scenarios with standard double reading with arbitration in a sample of 249,402 mammograms from a representative screening population. A commercial AI system replaced the first reader (Scenario 1: Integrated AIfirst), the second reader (Scenario 2: Integrated AIsecond), or both readers for triaging of low- and high-risk cases (Integrated AItriage). AI threshold values were partly chosen based on previous validation and fixing screen-read volume reduction at approximately 50% across scenarios. Detection accuracy measures were calculated. Compared with standard double reading, Integrated AIfirst showed no evidence of a difference in accuracy metrics except for a higher arbitration rate (+0.99%; P < .001). Integrated AIsecond had lower sensitivity (-1.58%; P < 0.001), negative predictive value (NPV) (- 0.01%; P < .001) and recall rate (< 0.06%; P = 0.04), but a higher positive predictive value (PPV) (+0.03%; P < .001) and arbitration rate (+1.22%; P < .001). Integrated AItriage achieved higher sensitivity (+1.33%; P < .001), PPV (+0.36%; P = .03), and NPV (+0.01%; P < .001) but lower arbitration rate (-0.88%; P < .001). Replacing one or both readers with AI seems feasible, however, the site of application in the workflow can have clinically relevant effects on accuracy and workload.