Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01x346d6976
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSingh, Mona-
dc.contributor.authorWetzel, Joshua-
dc.contributor.otherComputer Science Department-
dc.date.accessioned2019-04-30T17:52:56Z-
dc.date.available2020-04-15T09:14:32Z-
dc.date.issued2019-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01x346d6976-
dc.description.abstractInteractions between proteins and specific genomic loci are critical to the proper functioning of all cells. The ability of a DNA-binding protein (DBP) to distinguish between its target binding sites and other genomic regions is required for a myriad of crucial functions, including transcriptional regulation, meiotic recombination, chromatin remodeling, and genome organization. However, the fundamental relationship between the amino acid sequence of a DBP and its DNA-binding preferences remains largely elusive. High-throughput experimental technologies for detecting protein-DNA interactions have advanced substantially in the past decade and have enabled measurements for thousands of natural and synthetic DBP variants. However, these technologies typically require sophisticated analyses to uncover intrinsic DNA-binding specificities from the measured signals, often have poorly understood noise and sampling profiles, and provide little insight into the underlying mechanism of interaction. Meanwhile, precise co-complex structural data provide great insight into the mechanistic principles guiding interactions between DBPs and their DNA ligands, albeit at substantially lower throughput. These two types of data are complementary but rarely considered in concert. In this dissertation, I describe novel computational approaches that improve the accuracy and interpretability of inferences derived from high-throughput protein-DNA interaction data, via direct consideration of the underlying protein-DNA structural interaction interface shared across proteins within the same DNA-binding family. First, I describe a systematic exploration of the DNA-binding landscape for Cys2-His2 zinc finger (C2H2-ZF) proteins, the most abundant DNA-binding family in eukaryotes. Here we inferred the largest set of C2H2-ZF specificities to date and developed a state-of-the-art structurally-inspired method for predicting specificities for novel C2H2-ZFs. Second, I demonstrate how to leverage the large amounts of specificity data available for DBPs to develop a general framework that improves accuracy of high-throughput DNA-binding specificity inferences by jointly considering interaction preferences for groups of proteins from the same DNA-binding family, rewarding global consistency according to an expected similarity measure reflecting family-level structural considerations. Finally, I provide a probabilistic framework for improving interpretability of high-throughput data by mapping inferred specificities of DBPs from the same DNA-binding family onto a common ``reference'' structural interface model derived from aggregated family-level co-complex data.-
dc.language.isoen-
dc.publisherPrinceton, NJ : Princeton University-
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>-
dc.subjectDNA-binding-
dc.subjectgene regulation-
dc.subjectprotein-DNA interaction-
dc.subjectprotein structure-
dc.subjecttranscription factor-
dc.subjectzinc finger-
dc.subject.classificationComputer science-
dc.subject.classificationBioinformatics-
dc.subject.classificationMolecular biology-
dc.titleStructure-aware Approaches for Deciphering Sequence-specific Protein-DNA Interactions-
dc.typeAcademic dissertations (Ph.D.)-
pu.embargo.terms2020-04-15-
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Wetzel_princeton_0181D_12872.pdf15.62 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.