Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp016969z416r
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Bousso Dieng, Adji | |
dc.contributor.advisor | Troyanskaya, Olga | |
dc.contributor.author | Ferragu, Constance | |
dc.contributor.other | Computer Science Department | |
dc.date.accessioned | 2024-08-08T18:39:06Z | - |
dc.date.available | 2024-08-08T18:39:06Z | - |
dc.date.created | 2024-01-01 | |
dc.date.issued | 2024 | |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp016969z416r | - |
dc.description.abstract | Protein sequence discrete diffusion models have become increasingly valuable for the design of novel proteins, due to their robust generative capabilities and effective bi-directional processing of sequence data. To improve the generation of sequences with desired functions, gradient guidance methods are often used to guide sampling with a discriminative model. However, sampling from these models presents challenges due to the vast and discrete nature of the sequence space. These models tend to prioritize denoising high-likelihood tokens, resulting in similar sequences. Furthermore, gradient guidance methods tend to collapse generation to fewer modes. Protein design pipelines tend to work with fixed-size batches of sequences. Hence, given the high cost of experimental validation, optimizing sample efficiency of these batches is essential. In this thesis, we propose Vendi Guidance, a guided diffusion sampling algorithm designed to improve the exploration efficiency of sequence space and the diversity of sampled sequence sets. Our method leverages the Vendi Score---a statistical measure of diversity---to select edit positions that will most effectively improve the diversity objective and to guide the model’s hidden representations towards diverse denoising steps. We demonstrate that Vendi Guidance can iteratively refine a seed sequence into a more diverse set of sequences, while ensuring that the quality of the sequences does not deteriorate. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | Princeton, NJ : Princeton University | |
dc.subject.classification | Computer science | |
dc.title | Diversity-Guided Sampling for Protein Sequence Design via Iterative Refinement | |
dc.type | Academic dissertations (M.S.E.) | |
pu.date.classyear | 2024 | |
pu.department | Computer Science | |
Appears in Collections: | Computer Science, 2023 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Ferragu_princeton_0181G_15055.pdf | 3.33 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.