Skip navigation
Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorDeng, Jia
dc.contributor.authorGoyal, Ankit
dc.contributor.otherComputer Science Department
dc.description.abstractGeometric intelligence is the aspect of human intelligence that relates to perceiving, communicating and reasoning about geometries. Artificial agents like robots must possess geometric intelligence to operate in unconstrained human environments and collaborate with humans in day-to-day life. Geometric intelligence is an umbrella term that encompasses many abilities. In this thesis, we focus on three crucial abilities – first, the ability to see or perceive geometry; second, the ability to communicate about geometry and space and third, the ability to reason and plan about geometries.For studying the ability to perceive geometry, we pursue two efforts. In one effort, we build a system, called IFOR that recognizes the geometric difference between two scenes and rearrange one scene into another. IFOR can handle unseen objects and transfer to the real world while being trained only on synthetic data. In another effort, we revisit the literature on perceiving objects from point clouds and uncover two surprising results. First, we show that auxiliary factors that are independent of network architecture explain most of the performance improvement. Second, we show that a simple view-based baseline outperforms sophisticated state-of-the-art methods. For studying the ability to communicate about geometry, we focus on grounding spatial relations, which are the atomic elements used to communicate about geometric arrangements. We find that existing datasets are insufficient as they lack large- scale, high-quality 3D ground truth information, which is critical for learning spatial relations. We fill this gap by constructing Rel3D. Finally, for studying the ability to reason about geometry, we pursue two efforts. In one effort, we study the problem of geometric reasoning in the context of question- answering. We introduce Dynamic Spatial Memory Network (DSMN), a new deep network architecture designed for answering questions that admit latent visual rep- resentations. In another effort, we explore the problem of geometric planning which requires simultaneous reasoning about geometries and planning. We find that existing benchmarks are insufficient for the problem. We propose PackIt – a virtual environment for geometric planning. We benchmark various baselines on PackIt and find that learning could be a viable way to gain geometric planning skills.
dc.publisherPrinceton, NJ : Princeton University
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=></a>
dc.subject.classificationComputer science
dc.titleTowards Geometric Intelligence: Seeing, Grounding and Reasoning over Geometries
dc.typeAcademic dissertations (Ph.D.)
pu.departmentComputer Science
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Goyal_princeton_0181D_14297.pdf13.95 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.