Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp016969z416r
Title: Diversity-Guided Sampling for Protein Sequence Design via Iterative Refinement
Authors: Ferragu, Constance
Advisors: Bousso Dieng, Adji
Troyanskaya, Olga
Department: Computer Science
Class Year: 2024
Publisher: Princeton, NJ : Princeton University
Abstract: Protein sequence discrete diffusion models have become increasingly valuable for the design of novel proteins, due to their robust generative capabilities and effective bi-directional processing of sequence data. To improve the generation of sequences with desired functions, gradient guidance methods are often used to guide sampling with a discriminative model. However, sampling from these models presents challenges due to the vast and discrete nature of the sequence space. These models tend to prioritize denoising high-likelihood tokens, resulting in similar sequences. Furthermore, gradient guidance methods tend to collapse generation to fewer modes. Protein design pipelines tend to work with fixed-size batches of sequences. Hence, given the high cost of experimental validation, optimizing sample efficiency of these batches is essential. In this thesis, we propose Vendi Guidance, a guided diffusion sampling algorithm designed to improve the exploration efficiency of sequence space and the diversity of sampled sequence sets. Our method leverages the Vendi Score---a statistical measure of diversity---to select edit positions that will most effectively improve the diversity objective and to guide the model’s hidden representations towards diverse denoising steps. We demonstrate that Vendi Guidance can iteratively refine a seed sequence into a more diverse set of sequences, while ensuring that the quality of the sequences does not deteriorate.
URI: http://arks.princeton.edu/ark:/88435/dsp016969z416r
Type of Material: Academic dissertations (M.S.E.)
Language: en
Appears in Collections:Computer Science, 2023

Files in This Item:
File Description SizeFormat 
Ferragu_princeton_0181G_15055.pdf3.33 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.