Microbiome count data are high-dimensional and usually suffer from uneven sampling depth, over-dispersion, and zero-inflation. In this thesis, we develop specialized analytical models for analyzing such count data. In Chapter 2, I develop a bi-level Bayesian hierarchical framework for microbiome differential abundance analysis. The bottom level is a multivariate count-generating process that links the observed counts to their latent normalized abundances. The top level is a mixture of Gaussian distributions with a feature selection scheme for differential abundance analysis. A simulation study on both simulated and synthetic data is conducted. A colorectal cancer case study demonstrates that a resulting diagnostic model trained by the selected microbial taxa can significantly improve the disease outcome prediction accuracy.
Along with identification of specific microbial taxa associated with diseases, recent scientific advancements provide mounting evidence that metabolism, genetics and environmental factors can all modulate microbial effects. In Chapter 3, I develop an integrative framework that can distinguish differentially abundant taxa across phenotypes while quantifying covariate-taxa effects. The integrative model incorporates a regression framework to successfully integrate microbiome taxonomies and metabolomics in two real microbiome datasets to provide biologically interpretable findings.
Microorganisms form complex communities and collectively affect host health. In Chapter 4, I propose a general framework, HARMONIES to infer a sparse microbiome network that describe the associations between microbial taxa. In comprehensive simulation studies, HARMONIES outperformed four other commonly used methods. When using published microbiome data from a colorectal cancer study, it discovered a novel community with disease-enriched bacteria.
Number of Pages
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Jiang, Shuang, "Bayesian Statistical Modeling of Metagenomics Sequencing Data" (2021). Statistical Science Theses and Dissertations. 22.