Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01wh246s26t
Title: Random Matrices in High-dimensional Data Analysis
Authors: Cheng, Xiuyuan
Advisors: Singer, Amit
Contributors: Applied and Computational Mathematics Department
Subjects: Applied mathematics
Issue Date: 2013
Publisher: Princeton, NJ : Princeton University
Abstract: This thesis studies the spectrum of kernel matrices built from high-dimensional data vectors, a mathematical problem that naturally arises in many applications. In the first part, we consider the spectrum of large kernel matrices built from independent random high-dimensional vectors (the null model). Specifically, we consider n-by-n matrices whose (i, j)-th entry is f(Xi^TXj), where X1,...,Xn are i.i.d. random vectors in R^p, and f belongs to a large class of real-valued functions. As p, n to infty, and p/n to gamma, we obtain a family of limiting spectral densities which includes the Marcenko-Pastur density and semi-circle density as special cases. The convergence of the spectral density is firstly proved for i.i.d. normal Gaussian vectors, and then extended to i.i.d vectors that can be "compared" with the normal Gaussian vectors. The study of the null model is fundamental towards understanding noise-corrupted kernel matrices, which are built from vectors admitting a decomposition of "signal + noise" (the "spiking" model). We provide conjectures for the spiking model based on our results for the null model. The second part addresses the application in cryo-EM, where certain kernel matrices built from microscopic image data are used to study the structure of biological molecules. We consider the situation where the molecule admits non-trivial group symmetries, and study (i) the symmetry detection problem and (ii) the structural reconstruction problem. For the former, we derive a theoretical solution based on estimating the rank of certain auto-correlation kernels. For the later, we propose two approaches extending the existing methods developed for non-symmetric molecules. For both problems the proposed methods are tested on simulated data sets. The cryo-EM problem together with other applications motivates the study of the random matrix model in the first part of the thesis.
URI: http://arks.princeton.edu/ark:/88435/dsp01wh246s26t
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Applied and Computational Mathematics

Files in This Item:
File Description SizeFormat 
Cheng_princeton_0181D_10781.pdf1.11 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.