Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01x920g081v
Title: Latent Variable Modeling and Causal Inference in Population-Structured Genetics
Authors: Cabreros, Irineo C.
Advisors: Storey, John D.
Contributors: Applied and Computational Mathematics Department
Keywords: Causal Inference
Latent Variables
Population Genetics
Population Structure
Statistical Genetics
Statistics
Subjects: Statistics
Genetics
Biostatistics
Issue Date: 2020
Publisher: Princeton, NJ : Princeton University
Abstract: Nonrandomly mating populations, referred to as structured populations, are commonly encountered in genetic studies. A common characteristic of structured populations is that separate subpopulations differ systematically in their genetic attributes. In a global sample of unrelated individuals, for example, allele frequencies typically differ between geographically-defined subpopulations. Two analytical goals when studying datasets exhibiting population structure are: (i) characterizing population structure and (ii) identifying causal gene-trait relationships in its presence. This work is comprised of two complementary projects, corresponding to each of these goals. In the first project, we introduce a computationally efficient algorithm for fitting the admixture model of population structure. The central strategy of our algorithm, which we call ALStructure, is to first estimate the latent linear subspace of admixture components and then search for models within this subspace that satisfy the probabilistic constraints of the admixture model. We find that ALStructure typically outperforms preexisting methods both in accuracy and speed under a wide array of simulated and real datasets. In the second project, we show how the random process of meiosis can be leveraged as a form of experimental randomization capable of uncovering causal relationships between genes and traits in the presence of population structure. We introduce novel tests based on parent-child trio data developed within the causal framework of potential outcomes. Additionally, we evaluate the causal properties of the popular transmission-disequilibrium test (TDT). We describe and assess the feasibility of assumptions under which each of these procedures are tests of a causal property, which we define as causal linkage. To enable this project, we first provide a detailed discussion of the connection between causality and measure theoretic probability by constructing causal models on probability spaces.
URI: http://arks.princeton.edu/ark:/88435/dsp01x920g081v
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Applied and Computational Mathematics

Files in This Item:
File Description SizeFormat 
Cabreros_princeton_0181D_13214.pdf28.7 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.