Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01b2773z85p
Title: Predicting Protein Interaction Sites Through Machine Learning and Data Aggregation
Authors: Etzion-Fuchs, Anat
Advisors: Singh, Mona
Contributors: Quantitative Computational Biology Department
Keywords: genomics
interaction-sites
machine learning
proteomics
Subjects: Bioinformatics
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: As biological processes are carried out predominantly by proteins, the accurate annotation of protein function is key to understanding life at the molecular level. Modern sequencing technologies can identify a vast number of proteins quickly and accurately at a pace that far exceeds our ability to annotate them experimentally. Thus, computational approaches play a key role in protein annotation. Fundamental aspects of a protein’s functionality can be uncovered by its interactions as proteins accomplish nearly all their activities via binding to various ligands. The interactions of proteins with other molecules are mediated by specific amino acids, and their identification is crucial for understanding how a specific protein performs its role. Therefore, identifying which protein residues mediate which interaction can help in narrowing the functional annotation gap.In this dissertation, I introduce novel computational approaches that leverage different data aggregation levels and different prior knowledge to predict interaction sites in proteins. First, I explore which properties of proteins are predictive of interaction sites. I analyze and encode numerous site-based properties that can be derived from sequence alone. My approach leverages protein domains–fundamental units within proteins that correspond to families that share sequence, structure, and evolutionary descent–and I introduce features extracted from them. Next, I rely on these predictive features and introduce a framework of supervised machine learning algorithms to predict interaction sites within protein domains. Finally, I utilize complete protein sequences and extend beyond standard supervised machine learning algorithms to develop predictors based on deep neural networks pretrained on unlabelled protein sequences. Together, these approaches provide a framework for predicting protein interaction sites to expand our knowledge of protein molecular interactions.
URI: http://arks.princeton.edu/ark:/88435/dsp01b2773z85p
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Quantitative Computational Biology

Files in This Item:
File Description SizeFormat 
EtzionFuchs_princeton_0181D_14000.pdf5.07 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.