Alternating recurrent events data arise commonly in health research; examples include hospital admissions and discharges of diabetes patients; exacerbations and remissions of chronic bronchitis; and quitting and restarting smoking. Recent work has involved formulating and estimating joint models for the recurrent event times considering non-negligible event durations. However, prediction models for transition between recurrent events are lacking. We consider the development and evaluation of methods for predicting future events within these models. Specifically, we propose a tool for dynamically predicting transition between alternating recurrent events in real time. Under a flexible joint frailty model, we derive the predictive probability of a transition from one event type to the other within a pre-specified time period. To circumvent numerical integration in calculating the predictive probability, we obtain the approximate transition probability by a Taylor expansion. Simulation results demonstrate that our tool provides better prediction performance in discrimination, as measured by the area under the ROC curve (AUC) and sensitivity, than prediction approaches that rely on standard binary regression models. Also, simulation shows that prediction results from approximate transition probability are as close as results from the exact predictive probability. We illustrate predictions in analyses of relapses of chronic bronchitis exacerbation from a pharmaceutical trial and hospital readmissions in patients with diabetes from Medicaid claims data.

The final part of this dissertation (Chapter 6) compares predictive performance between logistic regression and random forests for 30-day readmission using longitudinal claims data. Several studies have compared these and other prediction models using longitudinal electronic health records or claims data. Because most of them applied logistic regression to the longitudinal observations, ignoring the lack of independence within subjects, or claims data consisting of independent observations, a correct comparison of the models under longitudinal data remains obscure. Moreover, those studies did not compare the out-of-sample performance. We address these issues and compare the prediction performance of the models using longitudinal claims data. We implement simulations by randomly choosing a record from each patient's multiple records in the training set, fitting the two models, applying the models to the training, test, and external sets, and obtaining AUC and sensitivity for each. We observe that although random forests generally gives better predictions on the training set, logistic regression performs better on test and external sets. In an empirical study, we apply the prediction methods to Medicaid claims data covering inpatient admissions of patients with heart failure.

Degree Date

Summer 8-3-2022

Document Type


Degree Name



Statistical Science


Daniel F. Heitjan

Subject Area


Number of Pages




Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License