Abstract

My research is in the area of statistical genetics, and it contains three projects: (1) Differentiating the Cochran-Armitage (CA) trend test and Pearson’s chi-square test: location and dispersion; (2) Decomposing Pearson’s chi-square test: a linear regression and its departure from linearity; (3) Testing nonlinear gene-environment (GxE) interaction through varying coefficient and linear mixed models.

(1) In genetic case-control association studies, a standard practice is to perform the CA trend test with 1 degree-of-freedom (df) under the assumption of an additive model. However, when the true genetic model is recessive or near recessive, it is outperformed by Pearson’s chi-square test with 2 df. In this project we analytically reveal the statistical basis that leads to the phenomenon. First, we show that the CA trend test examines the location shift between the case and control groups, whereas Pearson’s chi-square test examines both the location and dispersion shifts between the two groups. Second, we show that under the additive model the effect of location deviation outweighs that of the dispersion deviation, and vice versa under a near recessive model. Therefore, Pearson’s chi-square test is a more robust test than the CA trend test and it outperforms the latter when the mode of inheritance evolves to the recessive end.

(2) In genetic case-control association studies, we could identify situations CA trend test outperformed the analysis model consistent with the underlying inheritance mode. In this project we analytically reveal the statistical basis that leads to the phenomenon. By elucidating the origin of the CA trend test as a linear regression model, we decompose Pearson’s chi-square test statistic into two components—one is the CA trend test statistic that measures the goodness-of-fit of the linear regression model, the other measures the discrepancy between the data and linear regression model. Under this framework we show the additive coding scheme, as well as the multiplicative coding scheme, increases the coefficient of determination of the regression model by increasing the spread of data points. We also obtain the conditions under which the CA trend test statistic equals the MAX statistic and Pearson’s chi-square test statistic.

(3) We present a novel statistical procedure to detect the nonlinear GxE interaction with continuous traits in sequencing association studies. Commonly-used approaches for GxE interaction usually assume linear relationship between genetic and environmental factor, thus they suffer power loss when the underlying relationship is nonlinear. Varying coefficient model is proposed to relax the linear assumption, however, it’s unable to adjust for population stratification, a major source of confounding in genome-wide association studies. To overcome these limitations, we develop the Varying-Coefficient embedded Linear Mixed Model (VC-LMM) for assessing the nonlinear GxE interaction and accounting for population stratification. The proposed VC-LMM well controls type I error rates when the population stratification is present, and it’s powerful for both common and low frequency variants. We apply computationally efficient algorithms for generating null distributions and estimating parameters in the linear mixed model, thus the computational burden is greatly reduced. Using simulation studies, we demonstrate the performance of VC-LMM.

Degree Date

Summer 2018

Document Type

Dissertation

Degree Name

Ph.D.

Department

Statistical Science

Advisor

Chao Xing

Subject Area

Statistics

Format

.pdf

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS