Contributor
Xinlei Wang, Steve Jiang
Subject Area
Biostatistics
Abstract
Over the past decade, artificial intelligence (AI), particularly through deep learning (DL) techniques, has made significant strides in fields like computer vision (CV) and natural language processing (NLP), leading to transformative advancements across numerous applications. This progress has sparked considerable enthusiasm within the medical field, where DL-related research has grown exponentially since 2015. However, despite these promising developments, the real-world deployment of DL models in healthcare remains limited, especially in safety-critical domains such as radiotherapy (RT), where reliability, safety, and sustained performance are critical. This thesis addresses three core challenges associated with the clinical application of DL models: (1) post-deployment performance degradation, (2) the lack of reliable, case-specific quality assessment for DL predictions, and (3) the absence of robust, generalizable performance monitoring frameworks tailored to dynamic clinical environments.
First, we examine the long-term performance patterns of DL models deployed in clinical settings, focusing on their effectiveness in adapting to evolving clinical practices. Through retrospective simulation using prostate cancer RT data from 2006 to 2022, we demonstrate a notable decline in auto-segmentation performance over time, attributed to changes in clinical practices, personnel shifts, and the introduction of new techniques. These findings underscore the necessity of continuous evaluation of model performance beyond initial validation.
Second, we propose a novel auto-contour quality assessment (QA) framework for DL-based segmentation in RT, specifically designed for online adaptive radiotherapy (OART). Our approach integrates Bayesian ordinal classification with uncertainty quantification to provide case-specific, uncertainty-aware quality assessments, accommodating various scenarios with limited or no manual labels. By incorporating an additional calibration step, our method can achieve clinical accuracy exceeding 90% as required for confident predictions. The proposed AI-assisted auto-contour QA model effectively streamlines contouring processes, substantially reducing manual effort and improving clinical workflow efficiency in OART. By integrating uncertainty quantification, our approach enables clinicians to make rapid, informed decisions, ensuring improved patient safety and workflow reliability in time-sensitive clinical workflows.
Third, we introduce DyMon, a dynamic monitoring framework utilizing empirical prediction interval coverage rates (EPCR) derived from conformal prediction, combined with adaptive statistical testing methods, to continuously monitor deployed AI models. EPCR serves as a robust, model-independent indicator, identifying distribution shifts by detecting deviations from predetermined coverage levels. We assess DyMon using real-world auto-segmentation data, validating its performance through three adaptive statistical tests: Bayesian adaptive testing, Window-limited Generalized Likelihood Ratio Cumulative Sum (WinLCUSUM), and Maximized CUSUM (MaxCUSUM), each suited to different change patterns. Our results demonstrate that DyMon effectively identifies performance deterioration in a timely manner while maintaining a controlled Type I error rate.
Degree Date
Summer 8-5-2025
Document Type
Dissertation
Degree Name
Ph.D.
Department
Department of Statistics and Data Science
Advisor
Xinlei Wang
Second Advisor
Steve Jiang
Format
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Recommended Citation
Wang, Biling, "Towards Reliable Clinical Applications of AI Models in Radiotherapy" (2025). Statistical Science Theses and Dissertations. 54.
https://scholar.smu.edu/hum_sci_statisticalscience_etds/54
