Many-to-One Voice Conversion Using Deep Learning

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01qb98mj67m

Title:	Many-to-One Voice Conversion Using Deep Learning
Authors:	Miron, Jack
Advisors:	Finkelstein, Adam
Department:	Computer Science
Class Year:	2022
Abstract:	This paper explores a technique for generating realistic audio of human speech. The specific problem in question is voice conversion: to convert any given utterance into one that preserves the linguistic information, but has the vocal qualities of a certain target speaker. In this case, the target speaker is former President Barack Obama. On a high level, the approach is to isolate the content of an utterance from the vocal features that inform speaker identity, then use a deep neural network to map the source's vocal features to those of the target speaker identity, and then finally reconstruct the audio by combining the target's vocal features with the original linguistic content. The goal is to be able to convert any input speech into natural-sounding audio of Obama's voice using this method. Beyond its entertainment value, voice conversion technology has many applications: uses in accessibility for patients who have lost their voice, voice disguising for privacy protection, voice dubbing for movies, and more.
URI:	http://arks.princeton.edu/ark:/88435/dsp01qb98mj67m
Type of Material:	Princeton University Senior Theses
Language:	en
Appears in Collections:	Computer Science, 1987-2023

Files in This Item:

File	Description	Size	Format
MIRON-JACK-THESIS.pdf		380.83 kB	Adobe PDF	Request a copy

Search

Browse