Tackling Imperfections in Data for Building Real-world Computer Vision Systems

Wang, Zeyu

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019p290d57c

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Russakovsky, Olga
dc.contributor.author	Wang, Zeyu
dc.contributor.other	Electrical and Computer Engineering Department
dc.date.accessioned	2023-03-06T22:55:08Z	-
dc.date.available	2023-03-06T22:55:08Z	-
dc.date.created	2022-01-01
dc.date.issued	2023
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp019p290d57c	-
dc.description.abstract	Computer vision systems are increasingly being deployed to real-world applications, such as recognition models on autonomous vehicles, captioning models in presentation software, and retrieval models behind visual search engines. Lots of practical challenges exist in building these real-world computer vision systems and many of them are associated with the imperfections in data. Specifically, real-world data can be biased with distracting spurious correlations, long-tailed with unbalanced presence of different categories, noisy with numerous flaws, and so on. In this dissertation, we study how to tackle three common data imperfections for different vision tasks. First, we investigate the bias issue in image classification. We introduce a new benchmark featuring controllable bias through data augmentation. We then provide a thorough comparison of existing bias mitigation methods and propose a simple approach which outperforms other more complex competitors. Second, we study the long tail issue in image captioning. We show how existing captioning models prefer common concepts and generate overly generic captions due to the long tail. To tackle the issue, on the evaluation side, we propose a new metric to capture both uniqueness and accuracy. On the modeling side, we introduce an inference-time re-ranking technique to generate diverse and informative captions. Finally, we tackle the noise issue in video retrieval. We demonstrate how noisy annotations introduce challenges in both model training and evaluation. We then propose to address the problem by utilizing a simple but effective multi-query approach. Through extensive experiments, we show that multi-query training leads to superior performance, and multi-query evaluation better reflects the true capabilities of retrieval models.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.subject.classification	Computer science
dc.title	Tackling Imperfections in Data for Building Real-world Computer Vision Systems
dc.type	Academic dissertations (Ph.D.)
pu.date.classyear	2023
pu.department	Electrical and Computer Engineering
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Wang_princeton_0181D_14404.pdf		4.05 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse