Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01js956k11r
Title: Cost Efficiency of Generating a Geodiverse Dataset for Object Classification
Authors: Lin, Phoebe
Advisors: Russakovsky, Olga
Department: Computer Science
Class Year: 2023
Abstract: Current datasets while large in size and widely used for object classification, do not always represent all objects around the world. This then creates a regional bias in models that cannot be overlooked especially when classifying images in underrepresented regions. In this paper, we introduce a new dataset, GeoDE, that is a geographically diverse image dataset representing 6 regions, 40 objects, and contains 61,940 images. Furthermore, we argue that GeoDE’s crowd-sourcing method of generating images by instructing content creators to capture images of specific objects is far more suitable for the object classification task than web-scraping and even crowd-sourcing methods of other geodiverse datasets. While GeoDE’s particular crowd-sourcing method is the costliest of geodiverse datasets, it is the one that ensures the best annotation quality, minimizes bias, and ensures the most suitable distribution for the object classification task.
URI: http://arks.princeton.edu/ark:/88435/dsp01js956k11r
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
LIN-PHOEBE-THESIS.pdf6.13 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.