Skip navigation
Please use this identifier to cite or link to this item:
Title: Exploring Context-Specific Expression Variation and Distal Gene Regulation via Latent Variable Models
Authors: Gewirtz, Ariel
Advisors: Engelhardt, Barbara E
Contributors: Quantitative Computational Biology Department
Keywords: eQTL
single-cell sequencing
topic model
Subjects: Bioinformatics
Computer science
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: As genome-wide sequencing expands across diverse populations, we are approaching a comprehensive understanding of the critical roles that genetic variation plays in gene expression. Genetic variants that are associated with gene expression levels (expression Quantitative Trait Loci, eQTLs) provide a link between gene expression and heritable disease. The majority of disease-related heritability can be attributed to trans-acting variants. Unfortunately, trans-eQTLs are hard to map due to small effect sizes, under-powered data cohorts, and a heavy multiple testing burden. Gene co-expression networks describe intricate relationships that yield insight into context-specific covariation. We first present and analyze 36 tissue- and tissue group-specific gene co-expression networks. We then show that integrating genetic variants into these networks, constructed using solely expression data, serves to both replicate edges and restrict the trans-eQTL testing burden. In the second chapter, we expand context to include sex and construct 112 tissue- and 38 sex-specific gene co-expression networks. We demonstrate that these networks reveal novel tissue- and sex-specific gene co-expression topology while reflecting known relevant biology. Using network edges to restrict tests, we map 204 trans-eQTLs across 27 tissues and identify 139 of the first sex-biased trans-eQTLs reported in humans. However, bulk RNA-seq samples contain a mixture of cell types; we demonstrate that estimated cell type proportion drives both co-expression and trans-eQTL signal. In chapter three, we introduce a probabilistic topic model, Telescoping Bimodal Latent Dirichlet Allocation (TBLDA), that jointly fits genotype and raw count data to learn groups of associated features across modalities. We apply it to GTEx data, demonstrate that it learns robust and biologically-relevant topics, and use the topics to map 1,173 trans-eQTLs. Finally, we improve TBLDA by adding amortization and GPU compatibility in chapter four to enable fast inference for large-scale single-cell RNA-seq data. We fit scTBLDA to genotypes and gene expression from 400K blood cells across 119 individuals and use the topics to map 66 cell-type specific trans-eQTLs. Throughout all chapters, we employ pertinent Bayesian latent variable models that quantify uncertainty and ensure interpretability. Taken together, this research provides novel insights into distal regulation underlying expression level variability across tissues, sex, and cell types.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Quantitative Computational Biology

Files in This Item:
File Description SizeFormat 
Gewirtz_princeton_0181D_14023.pdf10.28 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.