Title: A New Multiclass SVM-Based Predictive Model for Cys$$_{2}$$His$$_{2}$$ Zinc Finger Binding Authors: Brown, Noah Advisors: Singh, Mona Department: Molecular Biology Class Year: 2015 Abstract: DNA binding proteins, continually liaising with the proteome, the transcriptome and the genome, are almost as important to life as DNA itself. Increasing knowledge of how these proteins bind to sequences of DNA allows us to better understand the genome as well as the regulatory networks it governs. Cys$$_{2}$$His$$_{2}$$ zinc finger (C2H2-ZF) proteins make up the largest class of transcription factors in Metazoa. They are known to interact with DNA using tandem arrays of “fingers,” whose combination of amino acid residues encodes specificity and affinity for sequences of bases located in the DNA’s major groove. The success of early studies attempting to make predictions about the binding preferences of C2H2-ZF proteins was reduced by the internal assumptions in their probabilistic models. More recent studies have also fallen short of a definitive characterization of C2H2-ZF DNA binding due to a shortage of quality data. Last year, our group reported the construction of large, diverse libraries of C2H2-ZF proteins which were screened for their DNA binding affinity using a B1H system. In the current study, we reflect on available methods for predicting C2H2-ZF binding and leverage our library dataset to propose a new “one-vs-rest” multiclass support vector machine (SVM)-based prediction model for C2H2-ZF binding. Accuracy values from internal cross-validation experiments between 84-99% achieved by our multiclass toy models confirm a role for “one-vs-rest” classification in the prediction of C2H2-ZF DNA binding. Extent: 63 pages URI: http://arks.princeton.edu/ark:/88435/dsp01fx719p780 Type of Material: Princeton University Senior Theses Language: en_US Appears in Collections: Molecular Biology, 1954-2016