Diagnostic accuracy of ChestEye in primary care

A Spanish prospective study evaluated ChestEye Quality (Oxipit), an AI tool for analyzing 75 different pathologies on chest X-rays, using X-rays of 278 participants (48.2% with radiologic abnormalities). The study aimed to assess the AI's diagnostic performance in a primary care setting, focusing on its ability to identify images with or without abnormalities based on the comparison to the reference standard, which was the report of one radiologist. ChestEye had a standalone accuracy of 0.95, sensitivity of 0.48, and specificity of 0.98. Despite its high specificity, the AI's lower sensitivity to conditions commonly encountered in primary care - like mediastinal, vascular, and bone abnormalities - underscores the need for improvements to enhance diagnostic efficacy.

Read full study

Abstract

Interpreting chest X-rays is a complex task, and artificial intelligence algorithms for this purpose are currently being developed. It is important to perform external validations of these algorithms in order to implement them. This study therefore aims to externally validate an AI algorithm’s diagnoses in real clinical practice, comparing them to a radiologist’s diagnoses. The aim is also to identify diagnoses the algorithm may not have been trained for. A prospective observational study for the external validation of the AI algorithm in a region of Catalonia, comparing the AI algorithm’s diagnosis with that of the reference radiologist, considered the gold standard. The external validation was performed with a sample of 278 images and reports, 51.8% of which showed no radiological abnormalities according to the radiologist's report. Analysing the validity of the AI algorithm, the average accuracy was 0.95 (95% CI 0.92; 0.98), the sensitivity was 0.48 (95% CI 0.30; 0.66) and the specificity was 0.98 (95% CI 0.97; 0.99). The conditions where the algorithm was most sensitive were external, upper abdominal and cardiac and/or valvular implants. On the other hand, the conditions where the algorithm was less sensitive were in the mediastinum, vessels and bone. The algorithm has been validated in the primary care setting and has proven to be useful when identifying images with or without conditions. However, in order to be a valuable tool to help and support experts, it requires additional real-world training to enhance its diagnostic capabilities for some of the conditions analysed. Our study emphasizes the need for continuous improvement to ensure the algorithm’s effectiveness in primary care.

Return