Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp016969z416r
Title: | Diversity-Guided Sampling for Protein Sequence Design via Iterative Refinement |
Authors: | Ferragu, Constance |
Advisors: | Bousso Dieng, Adji Troyanskaya, Olga |
Department: | Computer Science |
Class Year: | 2024 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | Protein sequence discrete diffusion models have become increasingly valuable for the design of novel proteins, due to their robust generative capabilities and effective bi-directional processing of sequence data. To improve the generation of sequences with desired functions, gradient guidance methods are often used to guide sampling with a discriminative model. However, sampling from these models presents challenges due to the vast and discrete nature of the sequence space. These models tend to prioritize denoising high-likelihood tokens, resulting in similar sequences. Furthermore, gradient guidance methods tend to collapse generation to fewer modes. Protein design pipelines tend to work with fixed-size batches of sequences. Hence, given the high cost of experimental validation, optimizing sample efficiency of these batches is essential. In this thesis, we propose Vendi Guidance, a guided diffusion sampling algorithm designed to improve the exploration efficiency of sequence space and the diversity of sampled sequence sets. Our method leverages the Vendi Score---a statistical measure of diversity---to select edit positions that will most effectively improve the diversity objective and to guide the model’s hidden representations towards diverse denoising steps. We demonstrate that Vendi Guidance can iteratively refine a seed sequence into a more diverse set of sequences, while ensuring that the quality of the sequences does not deteriorate. |
URI: | http://arks.princeton.edu/ark:/88435/dsp016969z416r |
Type of Material: | Academic dissertations (M.S.E.) |
Language: | en |
Appears in Collections: | Computer Science, 2023 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Ferragu_princeton_0181G_15055.pdf | 3.33 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.