Skip navigation
Please use this identifier to cite or link to this item:
Title: Predicting DNA Recognition by Cys2His2 Zinc Finger Proteins With Random Forests
Authors: On, Brian
Advisors: Singh, Mona
Department: Computer Science
Class Year: 2016
Abstract: Cys2His2 zinc finger proteins comprise the largest transcription factor family in eukaryotic genomes. Prediction of their DNA-binding specificities would allow for both the design of chimeric proteins able to target specific regions in the genome as well as extraction of information about the regulatory/cellular networks. While early prediction methods for the DNA binding of Cys2His2 zinc fingers focused heavily on probabilistic or quantitative models, recent successful SVM, random forest, and neural network approaches have demonstrated the efficacy of machine learning algorithms toward DNA binding prediction. We continue to explore the application of machine learning to tackle the problem, leveraging a recently compiled, expansive dataset of Cys2His2 zinc finger protein-DNA interactions and the ensemble random forest technique. We test our approach on a set of naturally occurring proteins with experimentally determined binding specificities, and find that our algorithm is competitive with previously published state-of-the-art prediction methods. Overall, our random forest model is able to predict at least half the columns of experimental PWMs for over 80% of the naturally occurring proteins.
Extent: 55 pages
Type of Material: Princeton University Senior Theses
Language: en_US
Appears in Collections:Computer Science, 1988-2016

Files in This Item:
File SizeFormat 
On_Brian_2016_Thesis.pdf2.52 MBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.