Authors

Xi JiangFollow

Contributor

Qiwei Li, Guanghua Xiao, Lin Xu, Shidan Wang, Lei Dong, Lei Guo, Zhuoyu Wen, Liwei Jia

Abstract

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types of spatial patterns, in Chapter 2, we introduce a Bayesian approach to identify SV genes via a modified Ising model. The key idea is to use the energy interaction parameter of the Ising model to characterize spatial expression patterns. We use auxiliary variable Markov chain Monte Carlo algorithms to sample from the posterior distribution with an intractable normalizing constant in the model. Simulation studies using both simulated and synthetic data showed that the energy-based modeling approach led to higher accuracy in detecting SV genes than those kernel-based methods. When applied to two real SRT datasets, the proposed method discovered novel spatial patterns that shed light on the biological mechanisms.

Spatial domain identification is another direction in SRT analysis, which enables the transcriptomic characterization of tissue structures and further contributes to the evaluation of heterogeneity across different tissue locations. Current spatial domain analysis of SRT data primarily relies on molecular information and fails to fully exploit the morphological features present in histology images, leading to compromised accuracy and interpretability. To overcome these limitations, in Chapter 3, we develop a multi-stage statistical method called iIMPACT. It includes a finite mixture model to identify and define spatial domains based on AI-reconstructed histology images and spatial context of gene expression measurements, and a negative binomial regression model to detect domain-specific spatially variable genes. Through multiple case studies, we demonstrated iIMPACT outperformed existing methods, confirmed by ground truth biological knowledge. These findings underscore the accuracy and interpretability of iIMPACT as a new clustering approach, providing valuable insights into the cellular spatial organization and landscape of functional genes within SRT data.

Most next-generation sequencing-based SRT techniques are limited to measuring gene expression in a confined array of spots, capturing only a fraction of the spatial domain. Typically, these spots encompass gene expression from a few to hundreds of cells, underscoring a critical need for more detailed, single-cell resolution SRT data to enhance our understanding of biological functions within the tissue context. Addressing this challenge, in Chapter 4, we introduce BayesDeep, a novel Bayesian hierarchical model that leverages cellular morphological data from histology images, commonly paired with SRT data, to reconstruct SRT data at the single-cell resolution. BayesDeep effectively model count data from SRT studies via a negative binomial regression model. This model incorporates explanatory variables such as cell types and nuclei-shape information for each cell extracted from the paired histology image. A feature selection scheme is integrated to examine the association between the morphological and molecular profiles, thereby improving the model robustness. We applied BayesDeep to two real SRT datasets, successfully demonstrating its capability to reconstruct SRT data at the single-cell resolution. This advancement not only yields new biological insights but also significantly enhances various downstream analyses, such as pseudotime and cell-cell communication.

Degree Date

Fall 2023

Document Type

Dissertation

Degree Name

Ph.D.

Department

Department of Statistics and Data Science

Advisor

Dr. Guanghua Xiao

Second Advisor

Dr. Qiwei Li

Subject Area

Biostatistics, Genetics, Statistics

Number of Pages

174

Format

.pdf

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS