Subject Area

Statistics

Abstract

With the rapid development of new data collection and acquisition techniques, high-dimensional data have emerged from various fields. Consequentially, new variable selection methods especially in ultra-high dimensional problems are demanding.

The first part of this dissertation focuses on developing a new Bayesian variable selection method for a differential expression analysis using raw NanoString nCounter data. The medium-throughput mRNA abundance platform NanoString nCounter has gained great popularity in the past decade, due to its high sensitivity and technical reproducibility as well as remarkable applicability to ubiquitous formalin fixed paraffin embedded (FFPE) tissue samples. Based on RCRnorm developed for normalizing NanoString nCounter data and Bayesian LASSO for variable selection, we propose a fully integrated Bayesian method, called RCRdiff, to detect differentially expressed (DE) genes between different groups of tissue samples (e.g. normal and cancer). Unlike existing methods that often require normalization performed beforehand, RCRdiff directly handles raw read counts and jointly models the behaviors of different types of internal controls along with DE and non-DE gene patterns. Doing so would avoid efficiency loss caused by ignoring estimation uncertainty from the normalization step in a sequential approach and thus can offer more reliable statistical inference. We also propose clustering-based strategies for DE gene selection, which do not require any external dataset and are free of any arbitrary cutoff. Empirical evidence of the attractiveness of RCRdiff is demonstrated via extensive simulation and data examples.

The second part of this dissertation proposes a novel Bayesian variable selection method based on empirical likelihood for ultra-high dimensional data. Although a great amount of literature has shown that development of variable selection techniques can enable efficient and interpretable analysis of high dimensional data, variable selection involving ultra-high dimensional data, where the number of covariates p is (much) large than the sample size n, remains a highly challenging task. Furthermore, many popular methods based on linear regression models assume Gaussian random noise. In the semi-parametric domain, under the ultra-high dimensional setting, we propose a Bayesian empirical likelihood method for variable selection, which requires no distributional assumptions but only estimating equations. Motivated by doubly penalized empirical likelihood (EL), we introduce priors to regularize both regression parameters and Lagrange multipliers associated with the estimating equations, to promote sparse learning. We further develop an efficient Markov chain Monte Carlo (MCMC) sampling algorithm based on the active set idea, which has been proved to be useful in reducing computational burden in several existing studies. The proposed method not only inherits merits from both Bayesian and EL inferences, but also has superior performance in both the prediction and variable selection, as shown in our numerical studies.

Degree Date

Fall 2021

Document Type

Dissertation

Degree Name

Ph.D.

Department

Statistical Science

Advisor

Xinlei Wang

Format

.pdf

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Recommended Citation

Xu, Can, "Ultra-High Dimensional Bayesian Variable Selection With Lasso-Type Priors" (2021). Statistical Science Theses and Dissertations. 27.
https://scholar.smu.edu/hum_sci_statisticalscience_etds/27

Download

COinS

Statistical Science Theses and Dissertations

Ultra-High Dimensional Bayesian Variable Selection With Lasso-Type Priors

Subject Area

Abstract

Degree Date

Document Type

Degree Name

Department

Advisor

Format

Creative Commons License

Recommended Citation

Search

Browse

Submit

Links

Statistical Science Theses and Dissertations

Ultra-High Dimensional Bayesian Variable Selection With Lasso-Type Priors

Authors

Subject Area

Abstract

Degree Date

Document Type

Degree Name

Department

Advisor

Format

Creative Commons License

Recommended Citation

Share

Search

Browse

Submit

Links