Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01wh246v912
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFunkhouser, Thomas A-
dc.contributor.authorSong, Shuran-
dc.contributor.otherComputer Science Department-
dc.date.accessioned2019-01-02T20:23:55Z-
dc.date.available2019-01-02T20:23:55Z-
dc.date.issued2018-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01wh246v912-
dc.description.abstractIntelligent robots require advanced vision capabilities to perceive and interact with the real physical world. While computer vision has made great strides in recent years, its predominant paradigm still focuses on analyzing image pixels to infer two dimensional outputs (e.g. 2D bounding boxes, or labeled 2D pixels.), which remain far from sufficient for real-world robotics applications. This dissertation presents the use of amodal 3D scene representations that enable intelligent systems to not only recognize what is seen (e.g. Am I looking at a chair?), but also predict contextual information about the complete 3D scene beyond visible surfaces (e.g. What could be behind the table? Where should I look to find an exit?). More specifically, it presents a line of work that demonstrates the power of these representations: First it shows how 3D amodal scene representation can be used to improve the performance of a traditional tasks such as object detection. We present SlidingShapes and DeepSlidingShapes for the task of amodal 3D object detection, where the system is designed to fully exploit the advantage of 3D information provided by depth images. Second, we introduce the task of semantic scene completion and our approach SSCNet, whose goal is to produce a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. Third, we introduce the task of semantic-structure view extrapolation and our approach Im2Pano3D, which aims to predict the 3D structure and semantic labels for a full 360◦panoramic view of an indoor scene when given only a partial observation. Finally, we present two large-scale datasets (SUN RGB-D and SUNCG) that enable the research on data-driven 3D scene understanding. This dissertation demonstrates that leveraging a complete 3D scene representa- tions not only significantly improves algorithm’s performance for traditional computer vision tasks, but also paves the way for new scene understanding tasks that have pre- viously been considered ill-posed given only 2D representations. iii-
dc.language.isoen-
dc.publisherPrinceton, NJ : Princeton University-
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>-
dc.subject3D scene understanding-
dc.subjectRGB-D sensor-
dc.subject.classificationComputer science-
dc.titleData-Driven 3D Scene Understanding-
dc.typeAcademic dissertations (Ph.D.)-
pu.projectgrantnumber690-2143-
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Song_princeton_0181D_12807.pdf184.75 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.