Abstract

In this dissertation, we investigate sample size calculations for three different study designs: stratified cluster randomization trials (CRTs), paired experimental designs and paired cluster experimental designs.

Stratified CRTs have been frequently employed in clinical and healthcare research. Comparing with simple randomized CRTs, stratified CRTs reduce the imbalance of baseline prognostic factors among different intervention groups. Clusters are often naturally formed with random sizes in CRTs. With varying cluster size, commonly used ad hoc approaches ignore the variability in cluster size, which may underestimate (overestimate) the required number of clusters for each group per stratum and lead to underpowered (overpowered) clinical trials. In Chapter 2, we propose a closed-form sample size formula for estimating the required total number of subjects and for estimating the number of clusters for each group per stratum, based on Cochran-Mantel-Haenszel statistic for stratified cluster randomization design with binary outcomes, accounting for both clustering and varying cluster size. We investigate the impact of various design parameters on the relative change in number of clusters due to varying cluster size. Simulation studies are conducted to evaluate the finite-sample performance of the proposed sample size formula. A real application example of a pragmatic stratified CRT of a triad of chronic kidney disease (CKD), diabetes and hypertension is presented for illustration.

In paired experimental design, each study unit contributes a pair of observations. Investigators often encounter incomplete observations of paired outcomes in the data collected. Some study units contribute complete pairs of observations, while the others contribute either pre- or post-intervention observations. In Chapter 3, we derive a closed-form sample size formula based on the generalized estimating equation (GEE) approach by treating the incomplete observations as missing data in a linear model. The proposed method properly accounts for the impact of mixed structure of observed data: a combination of paired and unpaired outcomes. The sample size formula is flexible to accommodate different missing patterns, magnitude of missingness, and correlation parameter values. In the presence of missing data, the proposed method would lead to a more accurate sample size estimate comparing with the crude adjustment. Simulation studies are conducted to evaluate the finite-sample performance of the GEE sample size formula. A real application example is presented for illustration.

In Chapter 4, we extend the method in Chapter 3 and propose closed-form sample size formulas for paired cluster design with both continuous and binary outcomes, based on the GEE approach in generalized linear models. The sample size formulas are flexible to accommodate different correlation structures and missing patterns. In the simulation studies, we use bias-corrected sandwich variance estimators to address the issue of inflated type I error when the number of clusters is small. A real application example about physical fitness in Ecuadorian adolescents is presented for illustration.

Degree Date

Fall 12-21-2019

Document Type

Dissertation

Degree Name

Ph.D.

Department

Statistical Science

Subject Area

Biostatistics

Format

.pdf

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS