Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zw12z8510
Title: Towards Geometric Intelligence: Seeing, Grounding and Reasoning over Geometries
Authors: Goyal, Ankit
Advisors: Deng, Jia
Contributors: Computer Science Department
Subjects: Computer science
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: Geometric intelligence is the aspect of human intelligence that relates to perceiving, communicating and reasoning about geometries. Artificial agents like robots must possess geometric intelligence to operate in unconstrained human environments and collaborate with humans in day-to-day life. Geometric intelligence is an umbrella term that encompasses many abilities. In this thesis, we focus on three crucial abilities – first, the ability to see or perceive geometry; second, the ability to communicate about geometry and space and third, the ability to reason and plan about geometries.For studying the ability to perceive geometry, we pursue two efforts. In one effort, we build a system, called IFOR that recognizes the geometric difference between two scenes and rearrange one scene into another. IFOR can handle unseen objects and transfer to the real world while being trained only on synthetic data. In another effort, we revisit the literature on perceiving objects from point clouds and uncover two surprising results. First, we show that auxiliary factors that are independent of network architecture explain most of the performance improvement. Second, we show that a simple view-based baseline outperforms sophisticated state-of-the-art methods. For studying the ability to communicate about geometry, we focus on grounding spatial relations, which are the atomic elements used to communicate about geometric arrangements. We find that existing datasets are insufficient as they lack large- scale, high-quality 3D ground truth information, which is critical for learning spatial relations. We fill this gap by constructing Rel3D. Finally, for studying the ability to reason about geometry, we pursue two efforts. In one effort, we study the problem of geometric reasoning in the context of question- answering. We introduce Dynamic Spatial Memory Network (DSMN), a new deep network architecture designed for answering questions that admit latent visual rep- resentations. In another effort, we explore the problem of geometric planning which requires simultaneous reasoning about geometries and planning. We find that existing benchmarks are insufficient for the problem. We propose PackIt – a virtual environment for geometric planning. We benchmark various baselines on PackIt and find that learning could be a viable way to gain geometric planning skills.
URI: http://arks.princeton.edu/ark:/88435/dsp01zw12z8510
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Goyal_princeton_0181D_14297.pdf13.95 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.