Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp010p096b122
Title: Reconstructing Sound from Visual Piano Performances: An Overview
Authors: Wang, Henry
Advisors: Russakovsky, Olga
Department: Computer Science
Class Year: 2022
Abstract: Generating audio from video data is a fundamental problem in computer vision that combines multiple fields of knowledge. When applied to music, this task falls under the category of Automatic Music Transcription (AMT), which is itself a subset of the broader field known as Music Information Retrieval (MIR). In this paper, we perform an in-depth exploration into a specific area of AMT: the task of reconstructing sound and other important musical information from a silent visual piano performance. We survey the current state of progress being made and discuss many of the most promising methods that are currently being used to tackle this problem. We examine the full pipeline, from data collection to model implementation. Finally, we propose a simple data framework by refining existing solutions, suggesting improvements, and recommending additions to the existing data paradigms.
URI: http://arks.princeton.edu/ark:/88435/dsp010p096b122
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File SizeFormat 
WANG-HENRY-THESIS.pdf4.07 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.