Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp016w924g11j
Title: Bayesian Estimates of Global Protein Concentration and Uncertainty from Quantitative Mass Spectrometry
Authors: Kumar, Chirag
Advisors: Wuhr, Martin
Department: Chemistry
Certificate Program: Quantitative and Computational Biology Program
Class Year: 2023
Abstract: System-wide estimates of protein concentration are critical for fundamental biochemical questions with many applications in medicine. Currently, only mass spectrometry (MS) can reliably identify nearly all proteins in a cell. However, methods to estimate protein concentration remain error-prone. The state-of-the-art approach estimates protein concentration by relating the average peptide signal to protein concentration using a spike-in standard. This method exhibits a median two-fold error and does not provide uncertainty intervals, making these estimates challenging to use in biological applications. In this work, I utilized a combination of experiment and computation to systematically determine how protein concentration impacts observed peptide signals. I found that peptide ion counts respond linearly to protein concentration (R\(^{2}\) = 97%), peptide ion counts follow a reproducible left-skewed distribution that scales with protein concentration with a wide coefficient of variation of 18%, and peptides are stochastically observed based on their physiochemistry (with ~20% of peptides never being observed) and abundance (modeled by a sigmoid function with R\(^{2}\) = 98.8%). The proposed data generating process recreates observed MS signals (Kolmogorov-Smirnov two sample test p < 0.001) and notably recreates the empirical power law between average peptide ion count and protein concentration, providing insight into what creates the power law and credibility to the hypothesized data generation process. Based on this data generating process, I developed a Bayesian model to estimate protein concentration from observed mass spectrometer peptide signals and the number of missing peptides. The model estimates the protein concentration for all observed species in Escherichia coli growing exponentially with an R\(^{2}\) of 74% compared to the state-of-the-art’s R\(^{2}\) = 69%. Notably, the Bayesian method does not require a spike-in standard of known protein concentration to be added to the sample and provides inherent credible intervals with the protein concentration estimate. Thus, this work presents a significant advancement for protein concentration estimates for use in biological applications.
URI: http://arks.princeton.edu/ark:/88435/dsp016w924g11j
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Chemistry, 1926-2023

Files in This Item:
File Description SizeFormat 
KUMAR-CHIRAG-THESIS.pdf2.06 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.