While local validation of AI algorithms provides valuable insights into performance, it is limited to a single point in time, raising critical questions about how often validation should occur to ensure reliability over time. Ideally, we have a system in place to effectively monitor performance continuously over time, but how do we achieve this? This research introduces MMC+, a scalable framework for continuous AI performance monitoring that can detect data shifts that indicate potential performance deviations. Using foundation models and incorporating uncertainty bounds, MMC+ monitors multiple data streams to identify when AI systems are at risk of falling outside acceptable performance standards. The research has shown that this framework has the potential to act as an early warning system, allowing for timely intervention to prevent serious errors from impacting on patient care.
As more and more AI applications are integrated into clinical practice, the complexity and demand for continuous performance monitoring increases. Therefore, the development of scalable solutions such as MMC+ is critical to help healthcare departments effectively manage and monitor AI performance over time.
Read full study
Scalable Drift Monitoring in Medical Imaging AI
Abstract:
The integration of artificial intelligence (AI) into medical imaging has advanced clinical diagnostics but poses challenges in managing model drift and ensuring long-term reliability. To address these challenges, we develop MMC+, an enhanced framework for scalable drift monitoring, building upon the CheXstray framework that introduced real-time drift detection for medical imaging AI models using multi-modal data concordance. This work extends the original framework’s methodologies, providing a more scalable and adaptable solution for real-world healthcare settings and offers a reliable and cost-effective alternative to continuous performance monitoring addressing limitations of both continuous and periodic monitoring methods. MMC+ introduces critical improvements to the original framework, including more robust handling of diverse data streams, improved scalability with the integration of foundation models like MedImageInsight for high-dimensional image embeddings without site-specific training, and the introduction of uncertainty bounds to better capture drift in dynamic clinical environments. Validated with real-world data from Massachusetts General Hospital during the COVID-19 pandemic, MMC+ effectively detects significant data shifts and correlates them with model performance changes. While not directly predicting performance degradation, MMC+ serves as an early warning system, indicating when AI systems may deviate from acceptable performance bounds and enabling timely interventions. By emphasizing the importance of monitoring diverse data streams and evaluating data shifts alongside model performance, this work contributes to the broader adoption and integration of AI solutions in clinical settings.