Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp019w0326233
Title: | Closing The Gap: Mitigating Performance Disparities Without Diverse Training Data |
Authors: | Curl, Justin |
Advisors: | Narayanan, Arvind |
Department: | Computer Science |
Class Year: | 2022 |
Abstract: | We focus on understanding the best ways to mitigate performance disparities in machine learning applications. In our first literature review [Ch. 2], we find a disconnect between academic researchers and industry practitioners: while researchers propose diverse training data collection as one solution to performance disparities, companies state it is the only solution. To avoid future disconnects like this one, we recommend academic researchers in the ML fairness community 1) separate technical findings from normative prescriptions, 2) specify individual claims and contributions, and 3) avoid describing their research in terms of broad buzzwords like “bias,” “diversity,” and “fairness.” Yet, even with these ecommendations, there are still unresolved questions about the relationship between diverse training data and performance disparities. We contribute to answering those questions by conducting experiments on census data [Ch. 3], tweets [Ch. 4], and chest X-rays [Ch. 5]. In each case, we initially train a classifier with large amounts of undiverse training data (entirely from one demographic group), perform some intervention using little-to-no data from the minority group, and find that we can reduce or eliminate performance disparities. These findings demonstrate the potential of low-data interventions for mitigating performance disparities, and in a second literature review [Ch. 7], we identify other promising alternatives to diverse training data collection and evaluate them according to eight criteria we think industry practitioners are likely to care about such as computational costs, erformance impact, and difficulty to implement. |
URI: | http://arks.princeton.edu/ark:/88435/dsp019w0326233 |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1987-2024 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
CURL-JUSTIN-THESIS.pdf | 1.57 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.