Skip navigation
Please use this identifier to cite or link to this item:
Title: Prediction of Cancer Phenotypes Through Machine Learning Approaches: From Gene Modularity to Deep Neural Networks
Authors: Zamalloa, Jose Antonio
Advisors: Singh, Mona
Contributors: Quantitative Computational Biology Department
Keywords: Cancer
Deep Learning
Subjects: Bioinformatics
Issue Date: 2019
Publisher: Princeton, NJ : Princeton University
Abstract: The current genomics data influx is transforming healthcare by enabling precise diagnoses and individualized treatments. This is especially true for cancer, where we have genome sequencing and gene expression data across numerous individuals, along with measurements of drug response across hundreds of cancer cell lines. Computa- tional, statistical and machine learning methods play an essential role in analyzing these data in order to gain medically relevant insights. In this dissertation, I describe statistical and machine learning approaches to enable better stratification of cancer subtypes and predict therapy outcomes for individuals with cancer. First, I introduce Deep Pharmacogenomic Modules (Deep-PGMs), a framework to predict drug response outcomes for tumor samples using drug features and gene expression data. Genome expression signatures are a great aid for predicting whether a particular therapy may be beneficial for a specific cancer tumor. Traditional ma- chine learning approaches to predict the effect of a cancer drug on a tumor typically focus on the expression levels of either certain key cancer-relevant genes or of all genes. While genomic data can aid in describing the disease state of an individual by looking at isolated gene entities, genes in cells tend to act in concert to perform their functions. My approach takes advantage of the modular nature of gene regu- lation to build a reduced feature space that describes the cellular state of a tumor. I take advantage of unsupervised machine learning methods to build genomic and non-genomic feature spaces. I construct a deep neural network pipeline to predict drug efficacy outcomes on tumor cell line samples. I demonstrate that my framework outperforms traditional machine learning approaches that do not take advantage of the modular structure of gene expression data sets. I further apply my method to clinical trial data and demonstrate its performance. I find that featurizing genomic data through prior knowledge about cellulary modularity, accompanied with a robust deep learning pipeline, is a powerful method for predicting the disease outcome of novel cancer therapeutics. In the second part of my thesis, I develop classifiers to identify two breast cancer subtypes. First, I describe an accurate Claudin-low (CL) molecular subtype predictor based on gene expression data. This particular subtype has poor prognosis in breast cancer patients. Via experiments in mice along with analysis of human breast cancer data, my collaborators and I linked individuals with CL breast cancer to elevated lev- els of miR-199a. This evidence further supported the high levels of miR-199a in mice tumors and helped characterize the miR-199a-LCOR-IFN axis in tumor initiation. Next, I developed a hysteretic epithelial-mesenchymal transition (EMT) classifier. I use experimental data from TGF-β induced EMT mouse mammary tumor cells to find genes that are indicative of the hysteretic EMT phenotype. The uncovered genes in my model correlate well with metastatic phenotypes in clinical datasets, particularly in patients with metastatic lung cancer, suggesting that EMT-induced mice mammary tumor cells can help elucidate clinically relevant genes important in metastasis.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Quantitative Computational Biology

Files in This Item:
File Description SizeFormat 
Zamalloa_princeton_0181D_12877.pdf4.48 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.