Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp010p096b122
Title: | Reconstructing Sound from Visual Piano Performances: An Overview |
Authors: | Wang, Henry |
Advisors: | Russakovsky, Olga |
Department: | Computer Science |
Class Year: | 2022 |
Abstract: | Generating audio from video data is a fundamental problem in computer vision that combines multiple fields of knowledge. When applied to music, this task falls under the category of Automatic Music Transcription (AMT), which is itself a subset of the broader field known as Music Information Retrieval (MIR). In this paper, we perform an in-depth exploration into a specific area of AMT: the task of reconstructing sound and other important musical information from a silent visual piano performance. We survey the current state of progress being made and discuss many of the most promising methods that are currently being used to tackle this problem. We examine the full pipeline, from data collection to model implementation. Finally, we propose a simple data framework by refining existing solutions, suggesting improvements, and recommending additions to the existing data paradigms. |
URI: | http://arks.princeton.edu/ark:/88435/dsp010p096b122 |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1987-2024 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
WANG-HENRY-THESIS.pdf | 4.07 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.