Abstract

New technologies such as single-cell RNA-sequencing (scRNA-seq) have become vital to the understanding of cell type heterogeneity. One limitation of this technique is that the resulting data sets are sparse. That is, genes often have read counts of zero in a given cell. Computational challenges arise when sparse data sets are analyzed with Weighted Gene Co-Expression Network Analysis (WGCNA), a technique that has been used to study the underlying genetic network of bulk data sets.

This project aims to study how sparsity degrades the performance of WGCNA. This is done by modifying data sets where the method has been successfully applied. Gene clusters, or modules, of the generated network can then be tracked from the original data set across varying levels of sparsity. This gives insight into how the network construction is altered when a sparse data set is used. We will then study imputation and smoothing techniques to recover performance. Finally, we will seek to determine significant statistical features of the data that predict model performance.

Degree Date

Winter 12-21-2024

Document Type

Dissertation

Degree Name

Ph.D.

Department

Mathematics

Advisor

Andrea Barreiro

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS