Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01x059cb09d
Title: Detecting and Analyzing Variation in Protein Interactions
Authors: Kobren, Shilpa Nadimpalli
Advisors: Singh, Mona
Contributors: Computer Science Department
Keywords: cancer genomics
gene regulation
protein domain
protein interaction interface
Subjects: Bioinformatics
Computer science
Issue Date: 2018
Publisher: Princeton, NJ : Princeton University
Abstract: Proteins carry out a dazzling multitude of functions by interacting with DNA, RNA, other proteins and various other molecules within our cells. Together these interactions comprise complex networks that differ naturally across cells within an organism, across individuals in a population, and across species. Although such variation is critical for normal organismal functioning, mutations affecting protein interactions are also known to underlie a wide range of human diseases. In this dissertation, I introduce novel computational approaches that explore the extent to which specific protein interactions vary across species, across healthy individuals, and across individuals with cancer. To start, I focus on interaction variation across species. It is well established that changes in protein-DNA interactions underlie a wide range of observable differences across species. These differences are primarily thought to stem from changes in the DNA sites that transcription factor (TF) proteins bind to, although changes in the binding properties of TFs themselves have also been observed. Determining the prevalence of such TF changes, however, remains infeasible using current experimental approaches. Here, I develop and apply a comparative genomics framework to systematically quantify changes in the DNA-binding properties of orthologous TFs across species spanning ~45 million years of evolutionary divergence. I demonstrate that, contrary to expectation, cross-species regulatory network divergence resulting from changes in non-duplicated DNA-binding proteins is pervasive. These findings reveal a widespread yet largely unstudied source of divergence across transcriptional regulatory programs in animals. Next, I turn my attention to interaction variation across individuals. In order to comprehensively quantify this, I first combine large-scale sequence, domain and structure information to pinpoint sites within protein domains---the fundamental structural units in proteins---that are involved in binding DNA, RNA, peptides, ions, metabolites, or other small molecules. This domain-based approach enables us to identify putative interaction sites in over 60% of human genes, representing a 2.4-fold improvement over comparable state-of-the-art approaches for this task. I next demonstrate that whereas domain-inferred interaction sites are significantly depleted of natural variants across ~60,000 healthy individuals, these same sites are significantly enriched for cancer mutations across ~11,000 tumor samples. My analysis demonstrates that the cellular network variation that occurs across healthy individuals is unlikely to be due to changes within proteins; in contrast, mutations acquired in cancers appear to preferentially alter cellular networks by perturbing the proteins themselves. Finally, I show how we can leverage an interaction-based viewpoint to uncover mutated genes that play causal roles in human cancers. In particular, I aim to uncover genes whose interaction interfaces are significantly altered in tumors. Towards this end, I develop a robust computational framework that integrates my per-domain-position binding propensities with additional sources of biological data regarding protein functionality. I demonstrate that by analytically computing the significance of patterns of mutations, my approach is able to achieve a dramatic improvement in runtime over atypical empirical permutation test for this task. Moreover, my interaction-based method not only recapitulates known cancer driver genes faster and with greater precision than previous methods, but it also uncovers relatively rarely-mutated genes with likely roles in cancer. Through focusing on the somatic alteration of protein interaction interfaces in tumors, my method can inform the perturbed molecular mechanisms across known and putative cancer genes, thereby enabling valuable insights that may help guide personalized cancer treatments.
URI: http://arks.princeton.edu/ark:/88435/dsp01x059cb09d
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Kobren_princeton_0181D_12653.pdf27.25 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.