Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01b2773z41k
Title: Computational Reproducibility and the Fragile Families Challenge: Lessons Learned and Suggestions for the Future
Authors: Liu, David
Advisors: Salganik, Matthew J
Department: Computer Science
Certificate Program: Center for Statistics and Machine Learning
Class Year: 2018
Abstract: As the availability of social data and reliance on computational methods increases, there is a need to establish guidelines for computational reproducibility in the social sciences. The Fragile Families Challenge presented a unique case study in which interdisciplinary researchers developed social prediction models and then submitted papers for review. Based on our experience reproducing the results as part of a journal review process, we propose a set of guidelines that can improve the reproducibility of open sourced code. These findings suggest that open sourcing data and code is a crucial first step towards computational reproducibility but leaves the replicator with the task of configuring an appropriate computing environment and parsing the code structure. By leveraging virtualization and pipeline design - tools and concepts from software engineering - we develop a set of guidelines that journal editors can adopt. In the case of Fragile Families, these guidelines are shown to be simple enough for adoption yet effective in rendering code more transparent. The rewards of reproducibility are further shown by developing an extension that boosts one of the Challenge's submissions, improving the model's mean squared error.
URI: http://arks.princeton.edu/ark:/88435/dsp01b2773z41k
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
LIU-DAVID-THESIS.pdf859.76 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.