Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01rb68xf782
 Title: Latent Variable Models: Spectral Methods and Non-convex Optimization Authors: Wang, Kaizheng Advisors: Fan, Jianqing Contributors: Operations Research and Financial Engineering Department Keywords: clusteringdimension reductionlatent variable modelsnetwork analysisnon-convex optimizationspectral methods Subjects: StatisticsOperations research Issue Date: 2020 Publisher: Princeton, NJ : Princeton University Abstract: Latent variable models lay the statistical foundation for data science problems with unstructured, incomplete and heterogeneous information. The significant challenges in computation and memory call for efficient estimation procedures which faithfully output high-quality solutions without much fine-tuning. Comprehensive understanding of them helps build consolidated tool kits for complex problems in the future. This thesis is devoted to developments of principled methods that provably tackle latent variable models of different forms. We first establish theoretical guarantees for several spectral methods that recover latent structures using eigen-decomposition of certain data matrices. For ranking from pairwise comparisons, we investigate a vanilla spectral estimator through $\ell_{\infty}$ perturbation analysis of eigenvectors. This justifies its statistical optimality for identifying the top $K$ items. Next, we develop a general $\ell_p$ theory for PCA in Hilbert spaces with weak signals, obtaining the optimality of spectral clustering in sub-Gaussian mixture models. Based on that, we propose a convenient spectral algorithm for contextual community detection, where one seeks to recover communities in a network given additional node attributes. It is shown to achieve the information threshold for exact recovery. Beyond spectral methods, we develop an optimization-based framework for simultaneous dimension reduction and clustering that seeks to transform the data into a low-dimensional point cloud with well-separated clusters. It is very flexible and handles data that incapacitate many existing procedures. We name the method as Clustering via Uncoupled REgression, or CURE for short. For a linear version of the method under a mixture model, we prove that a perturbed gradient descent algorithm achieves near-optimal statistical precision within reasonable amount of time, even in the absence of good initialization. URI: http://arks.princeton.edu/ark:/88435/dsp01rb68xf782 Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu Type of Material: Academic dissertations (Ph.D.) Language: en Appears in Collections: Operations Research and Financial Engineering