Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01w0892f32h
Title: Interpretable 3D Scene Perception, Tracking and Generation
Authors: Santhanam, Shruthi Iyer
Advisors: Heide, Felix
Department: Computer Science
Class Year: 2024
Publisher: Princeton, NJ : Princeton University
Abstract: 3D scene understanding is a many-pronged problem that is fundamental to countless real-world applications. These are often deployed in safety-critical situations, meaning that both accuracy and interpretability are crucial. Current methods use non-learned algorithmic approaches, which may be more explainable but often have lower accuracies, or neural networks, which are generally not interpretable. In this thesis, we present two works that tackle different aspects of interpretable 3D scene understanding—perception, generation, and tracking. First, we combine scene perception and generation by reframing 3D object detection as a conditional generative process. Generative methods have shown to be effective learners of empirical priors, and we exploit this property by using a probabilistic diffusion model, conditioned on an input observation, to model each detected object explicitly on a ground plane. Then, we use a diffusion-based neural rendering method to reconstruct the scene, allowing for simultaneous 3D object detection and multi-object scene generation. This closed-loop pipeline gives us interpretability “for free” via the generated ground plane information. Second, we present a new intrinsically interpretable online 3D multi-object tracking method, based on graph attention networks (GAT). Using the tracking-by-detection framework, we use a GAT to perform data association on a track-detection bipartite graph between sequential frames. Furthermore, attention-based scores are used to visualize the GAT’s message passing behavior on a model level, as well as explain specific outputs. Along with excellent accuracy across datasets, this method provides much-needed online interpretability to deep learning-based tracking, setting the stage for safe and scalable applications.
URI: http://arks.princeton.edu/ark:/88435/dsp01w0892f32h
Language: en
Appears in Collections:Computer Science, 2023

Files in This Item:
File Description SizeFormat 
Santhanam_princeton_0181G_15017.pdf21.04 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.