Skip navigation
Please use this identifier to cite or link to this item:
Title: Inferring Intra-tumor Heterogeneity from DNA Sequencing Data
Authors: Myers, Matthew Abrams
Advisors: Raphael, Benjamin J
Contributors: Computer Science Department
Keywords: cancer
Subjects: Computer science
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: Cancer is a disease characterized by somatic mutations which accumulate over time and in response to evolutionary pressures. As a result, tumors are composed of multiple distinct clones each characterized by a different set of mutations. This intra-tumor heterogeneity is closely related to negative outcomes such as treatment resistance, relapse, and metastasis. The somatic mutations that define tumor clones can range greatly in size, from single-nucleotide variants (SNVs) which change only a single base position to copy-number aberrations (CNAs) which can affect thousands of bases up to the whole genome. Researchers use DNA sequencing data to measure these somatic mutations and study intratumor heterogeneity. However, the vast majority of cancer sequencing uses DNA from bulk tumor samples which are mixtures of millions of cells. Thus, the resulting data is a combination of DNA sequences across all tumor and normal cells. This presents challenges for analysis, as it is not immediately apparent from the sequencing reads which somatic mutations characterize individual clones. Recently, single-cell sequencing technologies enable researchers to measure DNA sequencing reads from individual cells. However, these technologies have higher rates of sequencing errors and limited sequencing coverage, so sophisticated algorithms are still needed to recover the tumor clones and their mutations. In this dissertation, we present three computational methods for inferring tumor clones and their constituent mutations from either bulk or single-cell DNA sequencing data. The first method, CALDER, infers tumor clones and their evolutionary relationships using SNVs from longitudinal bulk DNA sequencing samples. CALDER uses the longitudinal ordering to apply constraints on the clones present in each sample. The second method, SBMClone, infers tumor clones using SNVs from ultra-low coverage single-cell DNA sequencing data using the stochastic block model, a well-studied tool from statistical physics and network science. The third method, HATCHet2, infers clones and their allele-specific CNAs from one or more bulk DNA sequencing samples. HATCHet2 improves upon the state of-the-art with several methodological innovations, including variable-width binning, locality-aware clustering, and a novel statistic for quantifying allelic imbalance which enables the identification of mirrored subclonal CNAs, in which different alleles are amplified in different clones.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Myers_princeton_0181D_14313.pdf6.26 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.