Radiologist outperforms AI, but better together for cervical spine fractures

In this single-center, retrospective study, the C-spine tool by Aidoc was used to screen for cervical spine fractures and determine which undetected fractures and fractures additionally found by the AI software were in need of stabilizing therapy (IST). Compared to radiologists, the AI had a lower sensitivity (88.2% vs 71.5%), at similar specificity (99.2% vs 98.6%), with a higher miss rate of IST fractures. However, it detected most fractures undetected by the radiologists, including almost all IST. These findings support the intended use of the AI as a concurrent reader.

Read full study.

Abstract

Objectives

To compare diagnostic accuracy of a deep learning artificial intelligence (AI) for cervical spine (C-spine) fracture detection on CT to attending radiologists and assess which undetected fractures were injuries in need of stabilising therapy (IST).

Methods

This single-centre, retrospective diagnostic accuracy study included consecutive patients (age ≥18 years; 2007–2014) screened for C-spine fractures with CT. To validate ground truth, one radiologist and three neurosurgeons independently examined scans positive for fracture. Negative scans were followed up until 2022 through patient files and two radiologists reviewed negative scans that were flagged positive by AI. The neurosurgeons determined which fractures were ISTs. Diagnostic accuracy of AI and attending radiologists (index tests) were compared using McNemar.

Results

Of the 2368 scans (median age, 48, interquartile range 30–65; 1441 men) analysed, 221 (9.3%) scans contained C-spine fractures with 133 IST. AI detected 158/221 scans with fractures (sensitivity 71.5%, 95% CI 65.5–77.4%) and 2118/2147 scans without fractures (specificity 98.6%, 95% CI 98.2–99.1). In comparison, attending radiologists detected 195/221 scans with fractures (sensitivity 88.2%, 95% CI 84.0–92.5%, p < 0.001) and 2130/2147 scans without fracture (specificity 99.2%, 95% CI 98.8–99.6, p = 0.07). Of the fractures undetected by AI 30/63 were ISTs versus 4/26 for radiologists. AI detected 22/26 fractures undetected by the radiologists, including 3/4 undetected ISTs.

Conclusion

Compared to attending radiologists, the artificial intelligence has a lower sensitivity and a higher miss rate of fractures in need of stabilising therapy; however, it detected most fractures undetected by the radiologists, including fractures in need of stabilising therapy.

Clinical relevance statement

The artificial intelligence algorithm missed more cervical spine fractures on CT than attending radiologists, but detected 84.6% of fractures undetected by radiologists, including fractures in need of stabilising therapy.

Return