Reconstructing Sound from Visual Piano Performances: An Overview

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp010p096b122

Title:	Reconstructing Sound from Visual Piano Performances: An Overview
Authors:	Wang, Henry
Advisors:	Russakovsky, Olga
Department:	Computer Science
Class Year:	2022
Abstract:	Generating audio from video data is a fundamental problem in computer vision that combines multiple fields of knowledge. When applied to music, this task falls under the category of Automatic Music Transcription (AMT), which is itself a subset of the broader field known as Music Information Retrieval (MIR). In this paper, we perform an in-depth exploration into a specific area of AMT: the task of reconstructing sound and other important musical information from a silent visual piano performance. We survey the current state of progress being made and discuss many of the most promising methods that are currently being used to tackle this problem. We examine the full pipeline, from data collection to model implementation. Finally, we propose a simple data framework by refining existing solutions, suggesting improvements, and recommending additions to the existing data paradigms.
URI:	http://arks.princeton.edu/ark:/88435/dsp010p096b122
Type of Material:	Princeton University Senior Theses
Language:	en
Appears in Collections:	Computer Science, 1987-2024

Files in This Item:

File	Size	Format
WANG-HENRY-THESIS.pdf	4.07 MB	Adobe PDF	Request a copy

Search

Browse