Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01x059cb62j
Title: | Emotion-Style Transfer for Music Using Deep Learning |
Authors: | Moorehead, Bradley |
Advisors: | Klusowski, Jason |
Department: | Operations Research and Financial Engineering |
Certificate Program: | Center for Statistics and Machine Learning |
Class Year: | 2023 |
Abstract: | Neural style transfer (NST), which involves merging the content of one piece of media with the style of another, first emerged as a task for images and videos. Recently, NST has been gaining traction in the music domain, as researchers have created models which alter the style of a song to mimic either the style of another song or a style from a pre-defined set of styles. Unlike with visual mediums, with music, the concepts of ”content” and ”style” are not well-defined, and there is no single intuitive way of distinguishing them. Thus, in an NST model for music, it is necessary to define what exactly is meant by ”style.” In this paper, I present an emotion-style transfer model for music (ESTM) which defines the style of a particular song as the emotion which a typical listener would associate with that song. With this definition, my ESTM model combines the emotional aspects of a given ”style” song with the unemotional aspects of a given ”content” song by leveraging deep learning. To my knowledge, this paper represents the first attempt to create such a model. As such, the purpose of this work is to provide an initial exploration of emotion-style transfer and explore the effects of varying different model architectures and hyperparameter values. To design an ESTM model, I consulted the fields of AI music generation, music style transfer, affective algorithmic composition (AAC), and music emotion recognition (MER). I introduce two different ESTM architectures, train several models with different hyperparameter values, and discuss the performance of each model using quantitative metrics. My models were ineffective and failed to discernibly transfer emotional aspects of the style input to the content input; my work demonstrates that my approach to the emotion-style transfer task is too simple. However, a quantitative analysis of my models’ performances indicates that there is potential for improvement, especially with the second architecture. At the end of my paper, I suggest several routes for future research which might prove successful. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01x059cb62j |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Operations Research and Financial Engineering, 2000-2024 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
MOOREHEAD-BRADLEY-THESIS.pdf | 1.39 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.