Gleamer still outperforms ChatGPT 4 in fracture detection


The diagnostic capabilities of ChatGPT 4 for identifying distal radius fractures from wrist radiographs were evaluated in this retrospective study by comparing its performance with a board-certified radiologist, a hand surgery resident, a medical student, and the AI tool BoneView Trauma from Gleamer. ChatGPT 4 demonstrated commendable diagnostic accuracy, achieving a sensitivity of 88% and a specificity of 98%, with an overall diagnostic power (AUC) of 0.93. Although it excelled beyond the medical student, its performance was inferior to that of the hand surgery resident and the specialized AI, BoneView Trauma, which both showed superior sensitivity. At least for now, there still seems to be a place for specialized AI in medical imaging diagnostics.

Read full study

Diagnostic power of ChatGPT 4 in distal radius fracture detection through wrist radiographs

Archives of Orthopaedic and Trauma Surgery, 2024


Distal radius fractures rank among the most prevalent fractures in humans, necessitating accurate radiological imaging and interpretation for optimal diagnosis and treatment. In addition to human radiologists, artificial intelligence systems are increasingly employed for radiological assessments. Since 2023, ChatGPT 4 has offered image analysis capabilities, which can also be used for the analysis of wrist radiographs. This study evaluates the diagnostic power of ChatGPT 4 in identifying distal radius fractures, comparing it with a board-certified radiologist, a hand surgery resident, a medical student, and the well-established AI Gleamer BoneView™. Results demonstrate ChatGPT 4’s good diagnostic accuracy (sensitivity 0.88, specificity 0.98, diagnostic power (AUC) 0.93), surpassing the medical student (sensitivity 0.98, specificity 0.72, diagnostic power (AUC) 0.85; p = 0.04) significantly. Nevertheless, the diagnostic power of ChatGPT 4 lags behind the hand surgery resident (sensitivity 0.99, specificity 0.98, diagnostic power (AUC) 0.985; p = 0.014) and Gleamer BoneView™(sensitivity 1.00, specificity 0.98, diagnostic power (AUC) 0.99; p = 0.006). This study highlights the utility and potential applications of artificial intelligence in modern medicine, emphasizing ChatGPT 4 as a valuable tool for enhancing diagnostic capabilities in the field of medical imaging.