A German retrospective study assessed InferRead DR Chest (Infervision), an AI tool for diagnosing pathological conditions on chest X-ray. The study utilized 477 chest X-rays, with a reference standard established by two independent radiologists who collectively identified 226 findings across 167 patients. InferRead DR Chest analyzed these radiographs stand-alone, securing an average area under the curve of 0.84, with an optimized sensitivity and specificity of 85% and 75.4%, respectively. Notably, the study discovered that approximately 40% of cases could be accurately ruled out by concentrating on a single abnormality, demonstrating the AI's potential in specific screening situations. The findings also underscored how factors like sex, age, and comorbidities affect agreement levels between the AI readings and the reference standard, indicating that despite its promise for lessening radiologists' burdens, careful implementation and human oversight are essential for its effective use in clinical practice.
Read study here
Abstract
This retrospective study evaluated a commercial deep learning (DL) software for chest radiographs and explored its performance in different scenarios. A total of 477 patients (284 male, 193 female, mean age 61.4 (44.7–78.1) years) were included. For the reference standard, two radiologists performed independent readings on seven diseases, thus reporting 226 findings in 167 patients. An autonomous DL reading was performed separately and evaluated against the gold standard regarding accuracy, sensitivity and specificity using ROC analysis. The overall average AUC was 0.84 (95%-CI 0.76–0.92) with an optimized DL sensitivity of 85% and specificity of 75.4%. The best results were seen in pleural effusion with an AUC of 0.92 (0.885–0.955) and sensitivity and specificity of each 86.4%. The data also showed a significant influence of sex, age, and comorbidity on the level of agreement between gold standard and DL reading. About 40% of cases could be ruled out correctly when screening for only one specific disease with a sensitivity above 95% in the exploratory analysis. For the combined reading of all abnormalities at once, only marginal workload reduction could be achieved due to insufficient specificity. DL applications like this one bear the prospect of autonomous comprehensive reporting on chest radiographs but for now require human supervision. Radiologists need to consider possible bias in certain patient groups, e.g., elderly and women. By adjusting their threshold values, commercial DL applications could already be deployed for a variety of tasks, e.g., ruling out certain conditions in screening scenarios and offering high potential for workload reduction.