Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019w0326233
Title: Closing The Gap: Mitigating Performance Disparities Without Diverse Training Data
Authors: Curl, Justin
Advisors: Narayanan, Arvind
Department: Computer Science
Class Year: 2022
Abstract: We focus on understanding the best ways to mitigate performance disparities in machine learning applications. In our first literature review [Ch. 2], we find a disconnect between academic researchers and industry practitioners: while researchers propose diverse training data collection as one solution to performance disparities, companies state it is the only solution. To avoid future disconnects like this one, we recommend academic researchers in the ML fairness community 1) separate technical findings from normative prescriptions, 2) specify individual claims and contributions, and 3) avoid describing their research in terms of broad buzzwords like “bias,” “diversity,” and “fairness.” Yet, even with these ecommendations, there are still unresolved questions about the relationship between diverse training data and performance disparities. We contribute to answering those questions by conducting experiments on census data [Ch. 3], tweets [Ch. 4], and chest X-rays [Ch. 5]. In each case, we initially train a classifier with large amounts of undiverse training data (entirely from one demographic group), perform some intervention using little-to-no data from the minority group, and find that we can reduce or eliminate performance disparities. These findings demonstrate the potential of low-data interventions for mitigating performance disparities, and in a second literature review [Ch. 7], we identify other promising alternatives to diverse training data collection and evaluate them according to eight criteria we think industry practitioners are likely to care about such as computational costs, erformance impact, and difficulty to implement.
URI: http://arks.princeton.edu/ark:/88435/dsp019w0326233
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File SizeFormat 
CURL-JUSTIN-THESIS.pdf1.57 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.