Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01rb68xg07n
Title: Learning to Detect Objects by Grouping
Authors: Law, Hei
Advisors: Deng, Jia
Contributors: Computer Science Department
Subjects: Computer science
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: Extracting high level semantic information from visual input is an ability that human rely on to perform daily tasks. This often requires identifying and locating relevant objects before any higher level information is inferred. This step is known as object detection which is a fundamental task in computer vision. It has numerous real world applications and serves as an upstream task for many computer vision tasks. This dissertation makes three contributions to object detection. First we propose CornerNet, a new approach to object detection. CornerNet reformulates object detection as detecting and grouping pairs of keypoints. More specifically, CornerNet detects corners of the bounding boxes and predicts similar embedding vectors for corners from the same objects. CornerNet also introduces corner pooling, a new pooling layer that helps localize corners. Experiments on COCO show that CornerNet achieves an AP of 42.2%, outperforming all one-stage detectors. Second we propose CornerNet-Lite, a collection of two efficient variants, CornerNet-Saccade and CornerNet-Squeeze, of CornerNet. CornerNet-Lite explores two orthogonal directions: processing fewer pixels and reducing the processing cost of each pixel. Inspired by saccade in human vision system, CornerNet-Saccade processes fewer pixels by estimating object locations on a downsampled image and processes a subet of regions in high resolution. It is 6x faster than CornerNet and achieves a better AP. CornerNet-Squeeze reduces the cost by introducing a new compact hourglass backbone network. It is faster and more accurate than YOLOv3. Third we propose "Synthetic Opitmized Layout with Instance Detection (SOLID)", a new pretraining approach for object detection. SOLID consists of two main components. The first component generates synthetic images from a collection of unlabelled 3D models with optimized scene arrangement. The second component is an instance detection task where given a query image depicting a 3D model, a detector is trained to locate the instances of the same object in a target image. Experiments show that synthetic data can be effective for pretraining an object detector. From grouping corners in CornerNet and CornerNet-Lite to grouping instances in SOLID, this dissertation presents a new approach to object detection - learning to detect objects by grouping.
URI: http://arks.princeton.edu/ark:/88435/dsp01rb68xg07n
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Law_princeton_0181D_14292.pdf38.46 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.