Data-Driven 3D Scene Understanding

Song, Shuran

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01wh246v912

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Funkhouser, Thomas A	-
dc.contributor.author	Song, Shuran	-
dc.contributor.other	Computer Science Department	-
dc.date.accessioned	2019-01-02T20:23:55Z	-
dc.date.available	2019-01-02T20:23:55Z	-
dc.date.issued	2018	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01wh246v912	-
dc.description.abstract	Intelligent robots require advanced vision capabilities to perceive and interact with the real physical world. While computer vision has made great strides in recent years, its predominant paradigm still focuses on analyzing image pixels to infer two dimensional outputs (e.g. 2D bounding boxes, or labeled 2D pixels.), which remain far from sufficient for real-world robotics applications. This dissertation presents the use of amodal 3D scene representations that enable intelligent systems to not only recognize what is seen (e.g. Am I looking at a chair?), but also predict contextual information about the complete 3D scene beyond visible surfaces (e.g. What could be behind the table? Where should I look to find an exit?). More specifically, it presents a line of work that demonstrates the power of these representations: First it shows how 3D amodal scene representation can be used to improve the performance of a traditional tasks such as object detection. We present SlidingShapes and DeepSlidingShapes for the task of amodal 3D object detection, where the system is designed to fully exploit the advantage of 3D information provided by depth images. Second, we introduce the task of semantic scene completion and our approach SSCNet, whose goal is to produce a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. Third, we introduce the task of semantic-structure view extrapolation and our approach Im2Pano3D, which aims to predict the 3D structure and semantic labels for a full 360◦panoramic view of an indoor scene when given only a partial observation. Finally, we present two large-scale datasets (SUN RGB-D and SUNCG) that enable the research on data-driven 3D scene understanding. This dissertation demonstrates that leveraging a complete 3D scene representa- tions not only significantly improves algorithm’s performance for traditional computer vision tasks, but also paves the way for new scene understanding tasks that have pre- viously been considered ill-posed given only 2D representations. iii	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	3D scene understanding	-
dc.subject	RGB-D sensor	-
dc.subject.classification	Computer science	-
dc.title	Data-Driven 3D Scene Understanding	-
dc.type	Academic dissertations (Ph.D.)	-
pu.projectgrantnumber	690-2143	-
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Song_princeton_0181D_12807.pdf		184.75 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse