Subject Area

Statistics

Abstract

Data integration represents a key area of research for analyzing the rapidly growing volume of high-dimensional biological data across sources, stages, and modalities. To model and understand these complex, often non-linear relationships, deep learning has become an increasingly powerful tool. Here, we present two novel deep learning frameworks that address distinct but complementary integration challenges. The first framework aligns single-cell omics data across temporal stages, and the second bridges imaging and omics modalities to generate patient-level molecular profiles.

In Chapter 1, we briefly summarize existing approaches---both statistical and deep learning-based---for single-cell omics data integration and discuss their limitations for handling temporal single-cell RNA sequencing (scRNA-seq) data. This motivates the development of the temporally aware strategy introduced in Chapter 2.

In Chapter 2, we present TempNet, a deep learning framework for integrating temporal scRNA-seq data that simultaneously aligns sequential states and produces an informative low-dimensional embedding suitable for clustering and visualization. We apply our model to scRNA-seq datasets of early embryo development and tumor progression and compare it to existing single-cell data integration methods. We demonstrate that TempNet can better preserve the chronological order of stages, maintain key data characteristics, and reveal biologically meaningful cell subpopulations that conventional methods fail to detect.

In Chapter 3, we provide a comprehensive review of methods that leverage imaging data to predict molecular omics profiles, covering DNA-based alterations as well as bulk, single-cell, and spatial transcriptomics. We trace the methodological evolution of these frameworks, from early feature-driven statistical approaches to CNN-based architectures and more recent transformer- and graph-based techniques. The review further reveals that while contrastive alignment strategies have gained traction in spatial transcriptomics prediction, most existing methods for bulk omics treat the task as a direct image-to-output prediction problem and lack explicit mechanisms to align visual and molecular representations. This gap motivates the framework presented in Chapter 4.

In Chapter 4, we introduce Deep-ALIGN, an integrative deep learning algorithm that generates patient-level omics profiles from histopathology images using an explicit cross-modal alignment mechanism. We show that Deep-ALIGN outperforms existing state-of-the-art methods for bulk omics prediction, preserves pathway-level functional activity, and retains clinically relevant molecular signatures associated with patient survival. These findings demonstrate that explicit image-omics alignment enhances predictive performance and provides a principled strategy for scalable molecular profiling from routine histopathology.

Degree Date

Spring 2026

Document Type

Dissertation

Degree Name

Ph.D.

Department

Statistics and Data Science

Advisor

Monnie McGee

Second Advisor

Lin Xu

Number of Pages

118

Format

.pdf

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Included in

Biostatistics Commons

Share

COinS