Please use this identifier to cite or link to this item:
|Title:||3D Surfaces in the Wild|
|Abstract:||The recovery of 3D structure from a single 2D image remains an open problem in computer vision. Neural networks do reasonably well at predicting the 3D structure of limited scenes - mostly of indoor scenes and road scenes. But, they are unable to generalize well to unseen training images. We hypothesize that this is in large part due to the lack of diverse and large scale training data for 3D inference. Recent work has attempted to crowdsource 3D annotations of images in the "wild", but due to the large amount of labor involved, fails to produce datasets that are large and expressive enough to improve state-of-art in 3D inference. Our contribution is three-fold. First, we present a methodology for efficiently obtaining dense 3D annotations of everyday images scraped from the Internet, or images in the wild. Applying this method to Amazon Mechanical Turk workers, we crowdsourced a novel 3D vision dataset of large scale and diversity, which we call "3SIW". We provide full surface normal, depth, fold boundary, and occlusion boundary annotations for 20,000 images from the wild. Our methodology can be used to create other datasets of larger scale and diversity. Secondly, we provide benchmarks on 3SIW for four tasks: surface normal estimation, occlusion detection, fold detection, and semantic segmentation of planar surfaces. Lastly, we demonstrate that training on larger and more diverse data advances the state-of-art in 3D visual systems.|
|Type of Material:||Princeton University Senior Theses|
|Appears in Collections:||Computer Science, 1988-2022|
Files in This Item:
|FAN-DAVID-THESIS.pdf||4.83 MB||Adobe PDF||Request a copy|
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.