Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01qb98mj67m
Title: Many-to-One Voice Conversion Using Deep Learning
Authors: Miron, Jack
Advisors: Finkelstein, Adam
Department: Computer Science
Class Year: 2022
Abstract: This paper explores a technique for generating realistic audio of human speech. The specific problem in question is voice conversion: to convert any given utterance into one that preserves the linguistic information, but has the vocal qualities of a certain target speaker. In this case, the target speaker is former President Barack Obama. On a high level, the approach is to isolate the content of an utterance from the vocal features that inform speaker identity, then use a deep neural network to map the source's vocal features to those of the target speaker identity, and then finally reconstruct the audio by combining the target's vocal features with the original linguistic content. The goal is to be able to convert any input speech into natural-sounding audio of Obama's voice using this method. Beyond its entertainment value, voice conversion technology has many applications: uses in accessibility for patients who have lost their voice, voice disguising for privacy protection, voice dubbing for movies, and more.
URI: http://arks.princeton.edu/ark:/88435/dsp01qb98mj67m
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
MIRON-JACK-THESIS.pdf380.83 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.