Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019p290d57c
Title: Tackling Imperfections in Data for Building Real-world Computer Vision Systems
Authors: Wang, Zeyu
Advisors: Russakovsky, Olga
Contributors: Electrical and Computer Engineering Department
Subjects: Computer science
Issue Date: 2023
Publisher: Princeton, NJ : Princeton University
Abstract: Computer vision systems are increasingly being deployed to real-world applications, such as recognition models on autonomous vehicles, captioning models in presentation software, and retrieval models behind visual search engines. Lots of practical challenges exist in building these real-world computer vision systems and many of them are associated with the imperfections in data. Specifically, real-world data can be biased with distracting spurious correlations, long-tailed with unbalanced presence of different categories, noisy with numerous flaws, and so on. In this dissertation, we study how to tackle three common data imperfections for different vision tasks. First, we investigate the bias issue in image classification. We introduce a new benchmark featuring controllable bias through data augmentation. We then provide a thorough comparison of existing bias mitigation methods and propose a simple approach which outperforms other more complex competitors. Second, we study the long tail issue in image captioning. We show how existing captioning models prefer common concepts and generate overly generic captions due to the long tail. To tackle the issue, on the evaluation side, we propose a new metric to capture both uniqueness and accuracy. On the modeling side, we introduce an inference-time re-ranking technique to generate diverse and informative captions. Finally, we tackle the noise issue in video retrieval. We demonstrate how noisy annotations introduce challenges in both model training and evaluation. We then propose to address the problem by utilizing a simple but effective multi-query approach. Through extensive experiments, we show that multi-query training leads to superior performance, and multi-query evaluation better reflects the true capabilities of retrieval models.
URI: http://arks.princeton.edu/ark:/88435/dsp019p290d57c
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Electrical Engineering

Files in This Item:
File Description SizeFormat 
Wang_princeton_0181D_14404.pdf4.05 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.