Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018c97kt354
Title: Data-Driven Approaches and Systems to Interrogate Complex Disease
Authors: Dannenfelser, Ruth
Advisors: Troyanskaya, Olga G
Contributors: Computer Science Department
Keywords: cancer
computational biology
gene expression
immunotherapeutics
machine learning
text mining
Subjects: Computer science
Issue Date: 2020
Publisher: Princeton, NJ : Princeton University
Abstract: Large-scale genomic studies now give more predictive power than ever, allowing us to profile the composition of tissues, study cellular functions, and understand organismal traits at an unprecedented level of detail. This is particularly important for studying heterogeneous diseases, such as cancer, where small patient-specific differences play critical roles in disease development and progression. As the these studies accumulate, it is becoming increasingly important to develop methods to both discover novel biology while considering tissue and cell type specificity, and develop systems to help make this data explosion easily manageable, accessible, and interpretable. Towards these goals, in this dissertation, we build off the wealth of publicly available data to examine the interplay between cancer and the immune system and then develop two query-based visualization systems that enable interactive data exploration for the biomedical community at large. The first part of this work will present two perspectives on cancer and the immune system, starting with a semi-supervised approach for immune cell type quantification in chapter 2. Using derived immune markers we examined lymphocyte infiltration in breast cancer and found that estrogen receptor activity and genomic complexity are the key factors driving variation in lymphocytic infiltrate across individuals. Our method allowed us to make these discoveries on existing samples even when this was not the original intent of the study, without the need for additional experiments. In a broader scope, in chapter 3, we leveraged public expression data to further the development of targeted immunotherapeutics for solid tumors. Engineered T cell therapies have shown great promise for hematological cancers but have only found limited success in targeting solid tumors due to off target effects. Working closely with experimental collaborators we developed a method to prioritize pairs of antigen targets that will help engineered T cells hone in on tumor targets while minimizing damage to healthy tissues. Notably, we were able to narrow down the space of more than 2.7 million potential pairs, to a few hundred top candidates per tumor type, and find new transmembrane proteins with therapeutic potential, effectively speeding up the development of novel immunotherapeutics for solid tumors. The fourth and fifth chapters will cover how we can extract unbiased signals from large collections of biomedical data in the form of abstracts and repositories of transcriptomics data. First in chapter 4, we show how we can obtain informative tissue-disease-gene relationships from abstracts and integrate them into a system that presents different snapshots of curated interactions and adds tissue and disease annotations to gene lists from experimental assays (e.g., GWAS, differentially expressed genes, drug screens, etc). Secondly, in chapter 5, we extend SEEK, a gene expression search engine that simultaneously returns coexpressed genes and relevant datasets where query genes are likely coregulated. Our extension expands the search space across the major model organisms and provides a new cross-organism exploration interface to help facilitate translational research. Both systems will help experimentalists leverage existing knowledge to better explain the larger implications of their specific findings, without requiring additional computational expertise.
URI: http://arks.princeton.edu/ark:/88435/dsp018c97kt354
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
This content is embargoed until 2022-06-26. For more information contact the Mudd Manuscript Library.


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.