From Pixels to Scenes: Recovering 3D Geometry and Semantics for Indoor Environments

Zhang, Yinda

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp011831cn73t

Title:	From Pixels to Scenes: Recovering 3D Geometry and Semantics for Indoor Environments
Authors:	Zhang, Yinda
Advisors:	Funkhouser, Funkhouser A
Contributors:	Computer Science Department
Keywords:	3D geometry Computer vision Deep learning Indoor environment Scene understanding Semantic
Subjects:	Computer science
Issue Date:	2018
Publisher:	Princeton, NJ : Princeton University
Abstract:	Understanding the 3D geometry and semantics of real environments is in critically high demand for many applications, such as autonomous driving, robotics, and augmented reality. However, it is extremely challenging due to imperfect and noisy measurements from real sensors, limited access to ground truth data, and cluttered scenes exhibiting heavy occlusions and intervening objects. To address these issues, this thesis introduces a series of works that produce a geometric and semantic understanding of the scene in both pixel-wise and holistic 3D representations. Starting from estimating a depth map, which is a fundamental task in many approaches for reconstructing the 3D geometry of the scene, we introduce a learning-based active stereo system that is trained in a self-supervised fashion and reduces the disparity error to 1/10th of other canonical stereo systems. To handle a more common case where only one input image is available for scene understanding, we create a high-quality synthetic dataset facilitating pre-training of data-driven approaches, and demonstrating that we can improve the surface normal estimation and improve raw depth measurements from commodity RGBD sensors. Lastly, we pursue holistic 3D scene understanding by estimating a 3D representation of the scene, in which objects and room layout are represented using 3D bounding box and planar surfaces respectively. We propose methods to produce such a representation from either a single color panorama or a depth image, leveraging scene context. On the whole, these proposed methods produce understanding of both 3D geometry and semantics from the most fine-grained pixel level to the holistic scene scale, building foundations that support future work in 3D scene understanding.
URI:	http://arks.princeton.edu/ark:/88435/dsp011831cn73t
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Zhang_princeton_0181D_12803.pdf		79.02 MB	Adobe PDF	View/Download

Show full item record

Search

Browse