Monnie McGee, Linh Nghiem
Understanding high-dimensional data has become essential for practitioners across many disciplines. The general increase in ability to collect large amounts of data has prompted statistical methods to adapt for the rising number of possible relationships to be uncovered. The key to this adaptation has been the notion of sparse models, or, rather, models where most relationships between variables are assumed to be negligible at best. Driving these sparse models have been constraints on the solution set, yielding regularization penalties imposed on the optimization procedure. While these penalties have found great success, they are typically formulated with strong assumptions on the variability of the observed data. We consider variables observed with some amount of measurement error in the high-dimensional setting. The common sparsity inducing models must be corrected for measurement error from a variety of sources, requiring special reformulations with nonstandard solutions.
We propose to utilize a recent methodology, the Imputation Regularization Optimization algorithm, to incorporate correction for measurement error. Focusing on the scenario where the amount of variables outnumbers the amount of observations, a scenario known to break traditional correction methods, we focus on two classes of models. The first class of model we investigate is the Gaussian graphical model, which aims to find all pair-wise dependencies from observed multivariate data. We find our method to be asymptotically consistent, and the method provides compelling numerical improvement over a model not accounting for the contaminated data. The second class of models we investigate is the well-known generalized linear model, in which we show our correction method for contaminated covariates to be highly performant in comparison to other established techniques. To illustrate the real-world efficacy of our proposed procedures, both models are applied to a microarray data example.
Number of Pages
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Byrd, Michael, "Estimation and Variable Selection in High-Dimensional Settings with Mismeasured Observations" (2019). Statistical Science Theses and Dissertations. 12.