Skip navigation
Please use this identifier to cite or link to this item:
Title: A New Multiclass SVM-Based Predictive Model for Cys\(_{2}\)His\(_{2}\) Zinc Finger Binding
Authors: Brown, Noah
Advisors: Singh, Mona
Department: Molecular Biology
Class Year: 2015
Abstract: DNA binding proteins, continually liaising with the proteome, the transcriptome and the genome, are almost as important to life as DNA itself. Increasing knowledge of how these proteins bind to sequences of DNA allows us to better understand the genome as well as the regulatory networks it governs. Cys\(_{2}\)His\(_{2}\) zinc finger (C2H2-ZF) proteins make up the largest class of transcription factors in Metazoa. They are known to interact with DNA using tandem arrays of “fingers,” whose combination of amino acid residues encodes specificity and affinity for sequences of bases located in the DNA’s major groove. The success of early studies attempting to make predictions about the binding preferences of C2H2-ZF proteins was reduced by the internal assumptions in their probabilistic models. More recent studies have also fallen short of a definitive characterization of C2H2-ZF DNA binding due to a shortage of quality data. Last year, our group reported the construction of large, diverse libraries of C2H2-ZF proteins which were screened for their DNA binding affinity using a B1H system. In the current study, we reflect on available methods for predicting C2H2-ZF binding and leverage our library dataset to propose a new “one-vs-rest” multiclass support vector machine (SVM)-based prediction model for C2H2-ZF binding. Accuracy values from internal cross-validation experiments between 84-99% achieved by our multiclass toy models confirm a role for “one-vs-rest” classification in the prediction of C2H2-ZF DNA binding.
Extent: 63 pages
Type of Material: Princeton University Senior Theses
Language: en_US
Appears in Collections:Molecular Biology, 1954-2020

Files in This Item:
File SizeFormat 
PUTheses2015-Brown_Noah.pdf3.2 MBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.