Contributor

Xinlei Wang, Steve Jiang

Subject Area

Biostatistics

Abstract

Over the past decade, artificial intelligence (AI), particularly through deep learning (DL) techniques, has made significant strides in fields like computer vision (CV) and natural language processing (NLP), leading to transformative advancements across numerous applications. This progress has sparked considerable enthusiasm within the medical field, where DL-related research has grown exponentially since 2015. However, despite these promising developments, the real-world deployment of DL models in healthcare remains limited, especially in safety-critical domains such as radiotherapy (RT), where reliability, safety, and sustained performance are critical. This thesis addresses three core challenges associated with the clinical application of DL models: (1) post-deployment performance degradation, (2) the lack of reliable, case-specific quality assessment for DL predictions, and (3) the absence of robust, generalizable performance monitoring frameworks tailored to dynamic clinical environments.

First, we examine the long-term performance patterns of DL models deployed in clinical settings, focusing on their effectiveness in adapting to evolving clinical practices. Through retrospective simulation using prostate cancer RT data from 2006 to 2022, we demonstrate a notable decline in auto-segmentation performance over time, attributed to changes in clinical practices, personnel shifts, and the introduction of new techniques. These findings underscore the necessity of continuous evaluation of model performance beyond initial validation.

Second, we propose a novel auto-contour quality assessment (QA) framework for DL-based segmentation in RT, specifically designed for online adaptive radiotherapy (OART). Our approach integrates Bayesian ordinal classification with uncertainty quantification to provide case-specific, uncertainty-aware quality assessments, accommodating various scenarios with limited or no manual labels. By incorporating an additional calibration step, our method can achieve clinical accuracy exceeding 90% as required for confident predictions. The proposed AI-assisted auto-contour QA model effectively streamlines contouring processes, substantially reducing manual effort and improving clinical workflow efficiency in OART. By integrating uncertainty quantification, our approach enables clinicians to make rapid, informed decisions, ensuring improved patient safety and workflow reliability in time-sensitive clinical workflows.

Third, we introduce DyMon, a dynamic monitoring framework utilizing empirical prediction interval coverage rates (EPCR) derived from conformal prediction, combined with adaptive statistical testing methods, to continuously monitor deployed AI models. EPCR serves as a robust, model-independent indicator, identifying distribution shifts by detecting deviations from predetermined coverage levels. We assess DyMon using real-world auto-segmentation data, validating its performance through three adaptive statistical tests: Bayesian adaptive testing, Window-limited Generalized Likelihood Ratio Cumulative Sum (WinLCUSUM), and Maximized CUSUM (MaxCUSUM), each suited to different change patterns. Our results demonstrate that DyMon effectively identifies performance deterioration in a timely manner while maintaining a controlled Type I error rate.

Degree Date

Summer 8-5-2025

Document Type

Dissertation

Degree Name

Ph.D.

Department

Department of Statistics and Data Science

Advisor

Xinlei Wang

Second Advisor

Steve Jiang

Format

.pdf

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS