Machine Learning Methods for Computational Social Science

Tarr, Alexander

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01st74ct60x

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Imai, Kosuke
dc.contributor.author	Tarr, Alexander
dc.contributor.other	Electrical Engineering Department
dc.date.accessioned	2021-10-04T13:49:09Z	-
dc.date.available	2021-10-04T13:49:09Z	-
dc.date.created	2021-01-01
dc.date.issued	2021
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01st74ct60x	-
dc.description.abstract	Contributing to the rising popularity of computational social science, this dissertation presents new methods grounded in machine learning for solving several important problems in political science. In Chapter 2, adapted from coauthored work in Fifield et al. (2020), we present a new algorithm for sampling redistricting plans from arbitrary distributions. We formulate redistricting as a graph-cut problem and adapt an image segmentation algorithm from the computer vision literature to construct a Metropolis-Hastings style algorithm for sampling graph partitions. We then validate our algorithm using a small-scale map for which all possible redistricting plans can be enumerated, finding that our method samples from the true distribution. Lastly, we apply our algorithm to a more realistic redistricting problem using data from New Hampshire. In Chapter 3, adapted from coauthored work with June Hwang and Kosuke Imai, we develop a fully-automated video processing system for encoding information in political campaign advertisement videos. Our approach applies state-of-the-art algorithms to replicate a subset of variables in the human-labeled Wesleyan Media Project (WMP) data, performing tasks including video summarization, facial recognition, text recognition, speech recognition, audio classification, and text classification. We validate our method using the WMP data from the 2012 and 2014 election cycles, finding that machine coding is competitive with human coding for most of the variables considered in our study. In Chapter 4, adapted from coauthored work in Tarr and Imai (2021), we adapt the support vector machine (SVM) algorithm to address the balancing problem in causal inference. We first establish SVM as a kernel balancing method by showingthat the soft-margin SVM dual problem computes weights which balance functions in a reproducing kernel Hilbert space. We then show that the SVM cost parameter controls a trade-off between balance and sample size, allowing us to use path algorithms to give exact characterizations of how balance and causal effect estimates change over the path. We validate our method using simulation data, showing that our algorithm is competitive with leading balancing methods. Finally, we conduct an empirical study using the right heart catheterization data from Connors et al. (1996).
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a>
dc.subject	causal inference
dc.subject	gerrymandering
dc.subject	machine learning
dc.subject	MCMC
dc.subject	SVM
dc.subject	video data
dc.subject.classification	Political science
dc.subject.classification	Artificial intelligence
dc.subject.classification	Statistics
dc.title	Machine Learning Methods for Computational Social Science
dc.type	Academic dissertations (Ph.D.)
pu.date.classyear	2021
pu.department	Electrical Engineering
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Tarr_princeton_0181D_13868.pdf		14.41 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse