Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019593tz290
Title: TOWARDS FAIR MACHINE LEARNING ALGORITHMS FOR MHC BINDING AND ANTIGEN IDENTIFICATION
Authors: Glynn, Eric
Advisors: SINGH, MONA
Contributors: Molecular Biology Department
Keywords: FAIRNESS
HLA
MACHINE LEARNING
MHC
NEOANTIGEN
VACCINE
Subjects: Bioinformatics
Biology
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: The immune system protects the body from external threats, and through its adaptive arm detects and eliminates foreign pathogens and cancerous cells. Major histocompatibility complex (MHC) proteins play an essential role in adaptive immunity, as they allow cells to present a sampling of their proteome for immune surveillance by T cells, enabling detection and elimination of virally infected and cancerous cells. Identifying the specific peptides that can be bound by MHC proteins is thus critical for understanding the adaptive immune response to diverse threats. Many computational methods have been developed to predict MHC-peptide binding, and have enabled researchers to rapidly screen entire viral or cancer genomes in silico for putative T cell antigens. This technology is now used to design personalized immunotherapies targeting patient-specific neoantigens to fight cancers. Although MHC binding algorithms are widely used, the extreme polymorphism of the class-I MHC genes–with over 22,000 MHC alleles across human populations–poses significant obstacles for accurate antigen identification across diverse individuals and thus for all downstream research and therapies. In this dissertation, we develop a state-of-the-art machine learning system to predict peptide binding for MHC alleles and introduce the first system to estimate model performance for the tens of thousands of MHC alleles. We perform analysis showing that significant differences in the amount of binding data associated with each MHC allele lead to data disparities across racial and ethnic groups. We show that machine learning is able to mitigate some of these disparities, and introduce an algorithm that prioritizes data collection to address remaining disparities. As part of this dissertation, we also include an introductory review, which is designed to provide computational biologists an accessible entry point into the biological systems associated with antigen recognition; in addition to a biological primer, we cover the many use cases of and algorithmic innovations underlying MHC binding models. Taken together, the components of this dissertation serve to advance the state of MHC binding algorithms and will greatly aid the broad spectrum of downstream research and therapeutic applications that utilize MHC binding algorithms to identify T cell antigens in a genetically diverse human population.
URI: http://arks.princeton.edu/ark:/88435/dsp019593tz290
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Molecular Biology

Files in This Item:
This content is embargoed until 2023-04-20. For questions about theses and dissertations, please contact the Mudd Manuscript Library. For questions about research datasets, as well as other inquiries, please contact the DataSpace curators.


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.