Latent variable models for non-Gaussian data with applications to genome-wide variation

Hao, Wei

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp0144558h06x

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Storey, John D	-
dc.contributor.author	Hao, Wei	-
dc.contributor.other	Quantitative Computational Biology Department	-
dc.date.accessioned	2019-01-02T20:21:36Z	-
dc.date.available	2019-01-02T20:21:36Z	-
dc.date.issued	2018	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp0144558h06x	-
dc.description.abstract	Low-rank latent variable models have been applied in many fields, as the usefulness of being able to capture systematic variation and reduce the dimensionality of data cannot be understated. Principal Components Analysis is an exemplar of this idea and is considered a staple of data analysis. This thesis discusses low-rank latent variable models, primarily in the context of modeling population structure in modern genomics. The goal of our approach is to construct a general framework that utilizes the Binomial nature of genotyping data. We present multiple ways to fit models within this framework that are appropriate for the differing requirements of practitioners. We also show an application of our framework to the problem of genome-wide association testing. Further, we work on the important practical problem of validation for models of population structure, from the perspective of the population genetics principle of Hardy-Weinberg Equilibrium. Our approach to this allows for a variant by variant analysis in which problematic data points can be filtered before subsequent analysis. Further, these variant level results can be aggregated to assess genome-wide goodness-of-fit and to tune model parameters. Lastly, we extend this framework for Binomial data to single parameter exponential family data more generally. We discuss multiple ways to fit these models, as well as extensions of the goodness-of-fit test. Collectively, these methods form a novel paradigm for non-Gaussian latent variables with many potential future applications.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	statistical genetics	-
dc.subject.classification	Statistics	-
dc.subject.classification	Genetics	-
dc.title	Latent variable models for non-Gaussian data with applications to genome-wide variation	-
dc.type	Academic dissertations (Ph.D.)	-
pu.projectgrantnumber	690-2143	-
Appears in Collections:	Quantitative Computational Biology

Files in This Item:

File	Description	Size	Format
Hao_princeton_0181D_12794.pdf		6.41 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse