Abstract
New technologies such as single-cell RNA-sequencing (scRNA-seq) have become vital to the understanding of cell type heterogeneity. One limitation of this technique is that the resulting data sets are sparse. That is, genes often have read counts of zero in a given cell. Computational challenges arise when sparse data sets are analyzed with Weighted Gene Co-Expression Network Analysis (WGCNA), a technique that has been used to study the underlying genetic network of bulk data sets.
This project aims to study how sparsity degrades the performance of WGCNA. This is done by modifying data sets where the method has been successfully applied. Gene clusters, or modules, of the generated network can then be tracked from the original data set across varying levels of sparsity. This gives insight into how the network construction is altered when a sparse data set is used. We will then study imputation and smoothing techniques to recover performance. Finally, we will seek to determine significant statistical features of the data that predict model performance.
Degree Date
Winter 12-21-2024
Document Type
Dissertation
Degree Name
Ph.D.
Department
Mathematics
Advisor
Andrea Barreiro
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Recommended Citation
Robinson, Molly, "Adapting Weighted Gene Co-Expression Network Analysis for Next Generation Sequencing" (2024). Mathematics Theses and Dissertations. 27.
https://scholar.smu.edu/hum_sci_mathematics_etds/27