Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01js956k11r
Title: | Cost Efficiency of Generating a Geodiverse Dataset for Object Classification |
Authors: | Lin, Phoebe |
Advisors: | Russakovsky, Olga |
Department: | Computer Science |
Class Year: | 2023 |
Abstract: | Current datasets while large in size and widely used for object classification, do not always represent all objects around the world. This then creates a regional bias in models that cannot be overlooked especially when classifying images in underrepresented regions. In this paper, we introduce a new dataset, GeoDE, that is a geographically diverse image dataset representing 6 regions, 40 objects, and contains 61,940 images. Furthermore, we argue that GeoDE’s crowd-sourcing method of generating images by instructing content creators to capture images of specific objects is far more suitable for the object classification task than web-scraping and even crowd-sourcing methods of other geodiverse datasets. While GeoDE’s particular crowd-sourcing method is the costliest of geodiverse datasets, it is the one that ensures the best annotation quality, minimizes bias, and ensures the most suitable distribution for the object classification task. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01js956k11r |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1987-2023 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
LIN-PHOEBE-THESIS.pdf | 6.13 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.