A Dutch retrospective study evaluated BoneView (Gleamer), an AI tool for automated bone fracture detection, using X-rays of 1,508 image-sets from 1,227 patients with suspected fractures. The AI tool was assessed in four simulated clinical workflows: AI-standalone (AI-only diagnosis), AI-problem-solving (AI consulted when radiologist is uncertain), AI-triage (AI provides diagnosis; radiologist consulted only if AI is uncertain), and AI-safety net (AI consulted for all negative radiologist diagnoses). Reference diagnoses were established by two senior musculoskeletal radiologists.
The assistance of the AI tool led to significantly different outcomes across implementation approaches (p < 0.001). Compared to radiologists without AI (2.7%, 40/1508), false-negative (FN) rates were reduced most with the AI-safety net (0.07%, 1/1508), followed by AI-standalone (1.5%, 23/1508), and AI-triage (2.1%, 32/1508). In contrast, AI-problem-solving increased false negatives (3.2%, 48/1508). However, false positives (FPs) increased with AI use, particularly with AI-safety net (7.6%) and AI-standalone (6.8%), compared to radiologists without AI (1.2%).
The main conclusion of the study is that AI implementation strategy significantly influences diagnostic outcomes. Problem-solving and triage strategies performed worse than or similarly to radiologists alone and are therefore not recommended. AI-standalone may be helpful in settings without radiologist coverage, but the AI-safety net provided the greatest reduction in false negatives and serious clinical consequences. Despite an increase in false positives, the safety net method is likely to offer the best overall benefit when used in a structured clinical workflow.
Read full study
AI for fracture diagnosis in clinical practice: Four approaches to systematic AI-implementation and their impact on AI-effectiveness
European Journal of Radiology, 2025
Abstract
Purpose
Artificial Intelligence (AI) has been shown to enhance fracture-detection-accuracy, but the most effective AI-implementation in clinical practice is less well understood. In the current study, four approaches to AI-implementation are evaluated for their impact on AI-effectiveness.
Materials and methods
Retrospective single-center study based on all consecutive, around-the-clock radiographic examinations for suspected fractures, and accompanying clinical-practice radiologist-diagnoses, between January and March 2023. These image-sets were independently analysed by a dedicated bone-fracture-detection-AI. Findings were combined with radiologist clinical-practice diagnoses to simulate the four AI-implementation methods deemed most relevant to clinical workflows: AI-standalone (radiologist-findings not consulted); AI-problem-solving (AI-findings consulted when radiologist in doubt); AI-triage (radiologist-findings consulted when AI in doubt); and AI-safety net (AI-findings consulted when radiologist diagnosis negative). Reference-standard diagnoses were established by two senior musculoskeletal-radiologists (by consensus in cases of disagreement). Radiologist- and radiologist + AI diagnoses were compared for false negatives (FN), false positives (FP) and their clinical consequences. Experience-level-subgroups radiologists-in-training-, non-musculoskeletal-radiologists, and dedicated musculoskeletal-radiologists were analysed separately.
Results
1508 image-sets were included (1227 unique patients; 40 radiologist-readers). Radiologist results were: 2.7 % FN (40/1508), 28 with clinical consequences; 1.2 % FP (18/1508), 2 received full-fracture treatments (11.1 %). All AI-implementation methods changed overall FN and FP with statistical significance (p < 0.001): AI-standalone 1.5 % FN (23/1508; 11 consequences), 6.8 % FP (103/1508); AI-problem-solving 3.2 % FN (48/1508; 31 consequences), 0.6 % FP (9/1508); AI-triage 2.1 % FN (32/1508; 18 consequences), 1.7 % FP (26/1508); AI-safety net 0.07 % FN (1/1508; 1 consequence), 7.6 % FP (115/1508). Subgroups show similar trends, except AI-triage increased FN for all subgroups except radiologists-in-training.
Conclusion
Implementation methods have a large impact on AI-effectiveness. These results suggest AI should not be considered for problem-solving or triage at this time; AI standalone performs better than either and may be a source of assistance where radiologists are unavailable. Best results were obtained implementing AI as safety net, which eliminates missed fractures with serious clinical consequences; even though false positives are increased, unnecessary treatments are limited.