The electronic health record (EHR) is a digital version of the patient chart. All clinically relevant patient information can be accessed from the EHR by professionals involved in the patient’s care. For researchers, the EHR is a rich, convenient source for data to address a vast range of medical research questions.

In observational studies with EHR data, it is common to define the treatment/exposure status as a binary indicator reflecting whether patient was documented to receive a particular medication or procedure. The outcome can be any type of information on patient status documented in the EHR after the treatment has taken place.

The EHR, although not designed primarily for research, can serve as a platform for observational studies in clinical medicine. An advantage of the EHR is that it can document treatments unequivocally, provided the treatment – medication or procedure – appears in the record. For example, in a study in which treatment is the route of medication (intravenous= treated, oral=control), the EHR makes it clear which route was used. This does not, however, relieve the investigator from the responsibility of defining and measuring confounding variables, and properly adjusting for them in comparative analyses.

In Chapter 1, we demonstrate the use of longitudinal EHR data in an evaluation of the effects of treatment of 12,754 children with overweight/obesity in greater Dallas. Our objective in this study is to estimate the causal effect of clinician attention to elevated body v mass index (BMI), measured at up to 10 timepoints per child, on subsequent weight change. To account for bias from confounding, we use the propensity score stratification method, applied longitudinally at each timepoint. We specify the propensity score model to include baseline covariates, current values of time-varying covariates, and treatment status at the most recent visit.

An alternative method of causal inference when treatments are applied longitudinally in an observational study relies on the marginal structural model (MSM). When estimating an MSM, one eliminates confounding bias by constructing a series of propensity score models for treatment at each time, then weighting the subjects based on these scores. The MSM has the interpretation of a causal model for the effect of the series of treatments on the outcome.

Although MSMs are in wide use, there has been relatively little evaluation of the properties of model estimates in small samples. One can conduct a simulation study to assess properties such as the suitability of asymptotic approximations to moderate samples, best methods for computing the standard errors, choice of the weighting method, and robustness to incorrect assumptions about the MSM or the underlying propensity score model. Several simulation methods have been proposed, each with its pros and cons. In Chapter 2, we introduce a new, simplified simulation method that addresses the limitations of the existing methods. We demonstrate the use of our method in a Monte Carlo study to assess the properties of an estimated MSM involving treatment at two timepoints.

An oft-cited concern with MSMs is the sensitivity of model estimates to large weights. This issue arises in particular when there are multiple timepoints. As the number of timepoints increases, an individual’s propensity score can become very small, while the estimation weights – defined as the inverse of the propensity score – becomes correspondingly large. Having a few subjects with large weights can result in an unstable estimate. In Chapter 3, we use the novel simulation method that we introduced in Chapter 2 to conduct a Monte Carlo assessment of the impact of large weights on the validity of MSM estimates. Finally, vi we estimate a series of MSMs for the child obesity example from Chapter 1 and interpret the results in light of our simulation findings.

Degree Date

Spring 5-13-2023

Document Type


Degree Name



Statistical Science


Daniel F Heitjan

Second Advisor

Christy Boling Turer

Subject Area

Biostatistics, Statistics



Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License