Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp010k225f13x
Title: Four Essays on Political Methodology
Authors: Liu, Naijia
Advisors: Londregan, John JBL
Contributors: Politics Department
Subjects: Political science
Statistics
Issue Date: 2021
Publisher: Princeton, NJ : Princeton University
Abstract: The dissertation consists of four essays in political methodology, covering two important issues in the field: missing data imputation and text analysis. The first chapter of this dissertation extends missing data imputation to missing not at random (MNAR).Missing at random (MAR) is a more restrictive assumption than MNAR. However, missing not at random (MNAR) scenario is very plausible in social science datasets, such as missingness in sensitive survey questions. This chapter confronts MNAR by modeling the latent structure of the missingness to mitigate the influence of the unmeasured confounders that cause the missing values. This approach allows one to assume missing at random (MAR) conditional on the latent factor. The proposed method outperforms multiple imputation methods under MNAR. %The wide range of latent factor model enables scholar to tailor it to the dataset and the end goal of the analysis. In addition to simulation comparison, I show an application using latent factor model to impute the missing values in a self-reported ideology question, which is considered to be a sensitive question in the 2017 Chinese Netizen Survey dataset. I conclude the chapter with discussions of the scope of the method and potential extensions. The second chapter further applies the latent factor approach proposed by previous chapter to an observational causal inference setting. I demonstrate that when pre treatment confounders are missing not at random, existing methods cannot solve the missing data problem. Latent factor approach, under modified ignorability assumption is able to deal with missing confounders in the dataset. In addition to simulation comparison, I show an application using latent factor model to impute the missing values in an observational causal inference study, in which imputation significantly altered the estimate of causal effects. The third chapter takes data imputation problem into the next level - a valid inference.Social science researchers deal with missing values in various datasets. Little attention has been paid to inference post imputation. This chapter proposes a method to achieve valid statistical inference with missing data and a new way to evaluate performance of missing data inference, integrating missing value imputation step and model inference step. The proposed method uses a bias correction term to offset the difference between missing and complete observations. For a parametric regression model, the method relaxes the conventional ``missing at random" assumption and distributional assumptions. Simulation and validation results show the superior performance of the proposed method, as compared with more conventional imputation methods. I conclude the paper with an application using a survey dataset showing a substantive change of model estimation before and after imputation. Finally, the last chapter of this dissertation takes on another important issue of political methodology. Unsupervised text analysis models often are highly parameterized and a principled way of model selection is essential to the study results. This chapter proposes a model selection method to LDA topic model. Despite the popularity of LDA topic model, little instruction is given in terms of model selection. Due to the sparsity of text data, the commonly adopted methods for selecting the topic number often tend to overfit the number of topics in a given corpus. I an alternative method to estimate the number of topics by approximating marginal likelihood of Latent Dirichlet Allocation topic model, under the estimation regime of Gibbs sampling. This method alleviates the overfitting problem by adopting a likelihood ratio style estimator \citep{chib1995marginal}, where the marginal likelihood is penalized by the difference between prior and posterior mean. Later in the chapter, I present simulated comparison results in favor of the marginal likelihood approximation methods, and also an application on the Supreme Court data. I show that improvement on model selection leads to substantive change in study result. I also offer discussion on the relative performances between MCMC and variational methods.
URI: http://arks.princeton.edu/ark:/88435/dsp010k225f13x
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Politics

Files in This Item:
This content is embargoed until 2023-05-24. For more information contact the Mudd Manuscript Library.


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.