The Role of Read Depth in the Design and Analysis of Sequencing Experiments

Robinson, David  Garrett

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01hd76s238c

Title:	The Role of Read Depth in the Design and Analysis of Sequencing Experiments
Authors:	Robinson, David Garrett
Advisors:	Storey, John D
Contributors:	Quantitative Computational Biology Department
Keywords:	Differential expression Experimental design False discovery rate Read depth RNA-Seq Sequencing
Subjects:	Bioinformatics Statistics Genetics
Issue Date:	2015
Publisher:	Princeton, NJ : Princeton University
Abstract:	The development of quantitative sequencing technologies, such as RNA-Seq, Bar-Seq, ChIP-Seq, and metagenomics, has offered great insight into molecular biology. Proper design and analysis of these experiments require statistical models and techniques that consider the specific nature of sequencing data, which typically consists of a matrix of read counts per feature. An issue of particular importance to the development of these methods is the role of read depth in statistical accuracy and power. The depth of an experiment affects the power to make biological conclusions, meaning an experiment design must consider the tradeoff between cost, power, and the number of samples that are examined. Similarly, per-gene read depth affects each gene's power and accuracy, and must be taken into account in any downstream analysis. Here I explore many facets of the role of read depth in the design and analysis of sequencing experiments, and offer computational and statistical methods for addressing them. To assist in the design of sequencing experiments, I present subSeq, which examines the effect of depth in an experiment by subsampling reads to simulate lower depths. I use this method to examine the extent of read saturation across a variety of RNA-Seq experiments, and demonstrate a statistical model for predicting the effect of increasing depth in any experiment. I consider intensity-dependence in a technology comparison between microarrays and RNA-Seq, and show that the variance added by RNA-Seq depends more on depth than the variance in microarray depends on fluorescence intensity. I demonstrate that Bar-Seq data shares these depth-dependent properties with RNA-Seq and can be analyzed by the same tools, and further provide suggestions on the appropriate depth for Bar-Seq experiments. Finally, I show that per-gene read depth can be taken into account in multiple hypothesis testing to improve power, and introduce the method of functional false discovery rate (fFDR) control.
URI:	http://arks.princeton.edu/ark:/88435/dsp01hd76s238c
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Quantitative Computational Biology

Files in This Item:

File	Description	Size	Format
Robinson_princeton_0181D_11406.pdf		5.02 MB	Adobe PDF	View/Download

Show full item record

Search

Browse