Skip navigation
Please use this identifier to cite or link to this item:
Title: Advances in Computational Protein Structure Prediction
Authors: Subramani, Ashwin
Advisors: Floudas, Christodoulos A
Contributors: Chemical and Biological Engineering Department
Keywords: Astrofold
Hierarchical Protein Folding
Protein Structure Prediction
Subjects: Chemical engineering
Issue Date: 2011
Publisher: Princeton, NJ : Princeton University
Abstract: The thesis is premised on the application of optimization theory, and algorithms based on it, to the protein structure prediction problem. The protein structure prediction problem can be expressed simply as, "Given the amino acid sequence of the protein, what is its three dimensional structure?". A number of theories suggest alternate pathways for a protein to undertake the folding process. One such theory is the hierarchical theory of protein folding, which proposes that local secondary structure of proteins is formed prior to their three dimensional arrangement. Hence, the thesis aims to address a number of the intermediate problems to the tertiary structure prediction problem. Given an amino acid sequence of a protein, we first aim to predict the location of the secondary structure elements. To address this issue, a novel mixed-integer linear optimization model has been developed. The model divides a given protein sequence into overlapping nonapeptides, and evaluates the likelihood of the central amino acid to be in an alpha helix. This likelihood is expressed as a weighted linear sum of the pairwise probabilities of neighboring amino acid pairs to form hydrogen bonds. In addition, chemical shift based data from a large database is used to reduce the superstructure of possible helical locations in the protein. Having gathered information on the location of alpha helices and beta strands through different means, a novel mixed-integer optimization model to predict beta sheet topology has been developed. Accurate prediction of the topology of a protein would provide important distance constraints between amino acids separated in the primary sequence. The model aims to maximize the pseudo-contact energy between beta-strands in the protein. In addition, a model based on torsion angle dynamics and clustering aims to re-rank the shortlisted set of topologies in order to identify the native topology of the protein. In addition to constraints derived out of location and mutual contacts of secondary structures, it is important to impose distance and angle constraints on the disordered loop regions of the protein. Unlike the previous methods, the flexible nature of loops precludes successful prediction using only database-based methods. Hence, a novel loop structure prediction framework has been developed, which incorporates non-linear non-convex optimization, along with dihedral angle sampling and discrete side-chain optimization. An iterative approach is introduced to sequentially reduce the predicted bounds on the dihedral angles. All of these preceding steps are used to generate constraints, which are incorporated into the three dimensional structure prediction. The tertiary structure prediction algorithm combines deterministic global optimization, stochastic conformational space annealing and torsion angle dynamics to generate structural conformers which satisfy the constraints. For a blind case study, it is difficult to determine the native structure from an ensemble. Hence, a new, traveling-salesman problem (TSP) based clustering approach has been introduced. The method iteratively eliminates low quality structures from the ensemble, and eventually helps select five conformers which are closest to the native structure from the generated ensemble
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Chemical and Biological Engineering

Files in This Item:
File Description SizeFormat 
Subramani_princeton_0181D_10082.pdf8.37 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.