Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019p290d57c
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorRussakovsky, Olga
dc.contributor.authorWang, Zeyu
dc.contributor.otherElectrical and Computer Engineering Department
dc.date.accessioned2023-03-06T22:55:08Z-
dc.date.available2023-03-06T22:55:08Z-
dc.date.created2022-01-01
dc.date.issued2023
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp019p290d57c-
dc.description.abstractComputer vision systems are increasingly being deployed to real-world applications, such as recognition models on autonomous vehicles, captioning models in presentation software, and retrieval models behind visual search engines. Lots of practical challenges exist in building these real-world computer vision systems and many of them are associated with the imperfections in data. Specifically, real-world data can be biased with distracting spurious correlations, long-tailed with unbalanced presence of different categories, noisy with numerous flaws, and so on. In this dissertation, we study how to tackle three common data imperfections for different vision tasks. First, we investigate the bias issue in image classification. We introduce a new benchmark featuring controllable bias through data augmentation. We then provide a thorough comparison of existing bias mitigation methods and propose a simple approach which outperforms other more complex competitors. Second, we study the long tail issue in image captioning. We show how existing captioning models prefer common concepts and generate overly generic captions due to the long tail. To tackle the issue, on the evaluation side, we propose a new metric to capture both uniqueness and accuracy. On the modeling side, we introduce an inference-time re-ranking technique to generate diverse and informative captions. Finally, we tackle the noise issue in video retrieval. We demonstrate how noisy annotations introduce challenges in both model training and evaluation. We then propose to address the problem by utilizing a simple but effective multi-query approach. Through extensive experiments, we show that multi-query training leads to superior performance, and multi-query evaluation better reflects the true capabilities of retrieval models.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherPrinceton, NJ : Princeton University
dc.subject.classificationComputer science
dc.titleTackling Imperfections in Data for Building Real-world Computer Vision Systems
dc.typeAcademic dissertations (Ph.D.)
pu.date.classyear2023
pu.departmentElectrical and Computer Engineering
Appears in Collections:Electrical Engineering

Files in This Item:
File Description SizeFormat 
Wang_princeton_0181D_14404.pdf4.05 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.