This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.

Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level by permuting sgRNA labels, and thus it avoids restrictive distributional assumptions. Although PBNPA is designed to analyze CRISPR data, it can also be applied to analyze genetic screens im- plemented with siRNAs or shRNAs and drug screens. We compared the performance of PBNPA with competing methods on simulated data as well as on real data. PBNPA out- performed recent methods designed for CRISPR screen analysis, as well as methods used for analyzing other functional genomics screens, in terms of Receiver Operating Characteristics (ROC) curves and False Discovery Rate (FDR) control for simulated data under various settings. Remarkably, the PBNPA algorithm showed better consistency and FDR control on published real data as well.

Formalin-fixed, paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies, and diagnosis/prognosis of diseases. However, their appli- cation is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity, which may greatly facil- itate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far be- hind that for traditional technologies such as microarray. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. We construct an integrated system of random-coefficient hierarchical re- gression models to capture main patterns and characteristics observed from NanoString data of FFPE samples, and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design, and simultaneously removes biases from various sources. Further, it eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Simulation and applications showed its superior performance.

Degree Date

Spring 2018

Document Type



Statistical Science


Xinlei Wang

Second Advisor

Guanghua Xiao

Subject Categories


Number of Pages




Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License