Subject Area
Statistics
Abstract
This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.
Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level by permuting sgRNA labels, and thus it avoids restrictive distributional assumptions. Although PBNPA is designed to analyze CRISPR data, it can also be applied to analyze genetic screens im- plemented with siRNAs or shRNAs and drug screens. We compared the performance of PBNPA with competing methods on simulated data as well as on real data. PBNPA out- performed recent methods designed for CRISPR screen analysis, as well as methods used for analyzing other functional genomics screens, in terms of Receiver Operating Characteristics (ROC) curves and False Discovery Rate (FDR) control for simulated data under various settings. Remarkably, the PBNPA algorithm showed better consistency and FDR control on published real data as well.
Formalin-fixed, paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies, and diagnosis/prognosis of diseases. However, their appli- cation is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity, which may greatly facil- itate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far be- hind that for traditional technologies such as microarray. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. We construct an integrated system of random-coefficient hierarchical re- gression models to capture main patterns and characteristics observed from NanoString data of FFPE samples, and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design, and simultaneously removes biases from various sources. Further, it eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Simulation and applications showed its superior performance.
Degree Date
Spring 2018
Document Type
Dissertation
Degree Name
Ph.D.
Department
Statistical Science
Advisor
Xinlei Wang
Second Advisor
Guanghua Xiao
Number of Pages
104
Format
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Recommended Citation
Jia, Gaoxiang, "Developing Statistical Methods For Data From Platforms Measuring Gene Expression" (2018). Statistical Science Theses and Dissertations. 1.
https://scholar.smu.edu/hum_sci_statisticalscience_etds/1
Included in
Applied Statistics Commons, Biostatistics Commons, Statistical Methodology Commons, Statistical Models Commons