Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01jh343w032
Title: A Computational Pathway for Identifying Metabolites Relevant to Cancer Development: New Methods Incorporating Protein Structure and DiffMut
Authors: Berman, Adam
Advisors: Singh, Mona G
Department: Computer Science
Certificate Program: Quantitative and Computational Biology Program
Class Year: 2018
Abstract: Last year, I developed a computational pipeline capable of leveraging TCGA (The Cancer Genome Atlas) genomic cancer data to assign scores to a list of all biologically active endogenous metabolites in accordance to their relevance to breast cancer, ultimately resulting in a ranked list of metabolites. At the time, the pipeline utilized two different types of data to assign these scores: mutational data, and RNA-Seq expressional data. After further consideration, it has become apparent that just these two types of data are insufficient to derive nuanced, meaningful scores. For example, not all mutations are equally likely to be positively correlated with cancer development. For this reason, I have modified my scoring algorithm to intelligently incorporate two new subscores: one based on information relating to whether mutations occur in the binding regions of the protein partners of the metabolites of interest, and another based on DiffMut, a pre-existing differential cancer mutational analysis program. With four different subscores for each metabolite, I perform pairwise combinations of these scores to determine the pair of scores that optimally reflects known cancer-linked metabolites and metabolic pathways. I also drastically improved the method of determining synonym metabolites from HMDB to properly account for isomers. Therefore, this work can be seen a drastic expansion, refactoring, and improvement on last year’s pipeline, particularly through its incorporation of positional information to consider the impact each mutation would have on the structural binding region of proteins that interact with metabolites. Indeed, the average area-under-the-curve (AUC) value of this year’s overall per-metabolite scores was a full 30 percent better than last year’s, increasing from 0.499 to 0.651.
URI: http://arks.princeton.edu/ark:/88435/dsp01jh343w032
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
BERMAN-ADAM-THESIS.pdf1.78 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.