Robust Dependence-Adjusted Methods for High Dimensional Data

Bose, Koushiki

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01ht24wn13d

Title:	Robust Dependence-Adjusted Methods for High Dimensional Data
Authors:	Bose, Koushiki
Advisors:	Fan, Jianqing
Contributors:	Operations Research and Financial Engineering Department
Keywords:	Dependence Adjustment Factor Models High Dimensional Data Robust Estimation R package
Subjects:	Statistics
Issue Date:	2018
Publisher:	Princeton, NJ : Princeton University
Abstract:	The focus of this dissertation is the development, implementation and verification of robust methods for high dimensional heavy-tailed data, with an emphasis on underlying dependence-adjustment through factor models. First, we prove a nonasymptotic version of the Bahadur representation for a Huber loss M-estimator in the presence of heavy-tailed errors. Consequently, we prove a number of important normal approximation results, including the Berry-Esseen bound and Cramér-type moderate deviation. This theory is used to analyze a covariate-adjusted multiple testing procedure under moderately heavy-tailed errors. We prove that the procedure asymptotically controls the overall false discovery proportion at the nominal level. Next, we present the development of an R package that conducts factor-adjusted robust multiple testing of mean effects, even where the factors are unobservable or partially observable. Experiments on real and simulated datasets demonstrate the superior performance of our package. Applying this testing procedure to RNA-Seq data from autism patients, we find new evidence for the etiology of the disease and novel pathways that may be changed in autism. Many of the candidate genes found are responsible for functions affected by autism, or implicated in autism comorbidities like seizures and epilepsy. We observe differences between functions of genes implicated in male and female patients: promising results since autism is a heavily gender-biased disease. Next, we present an R package that performs large-scale model selection for high dimensional sparse regression in the presence of correlated covariates. The software implements a consistent model selection strategy when the covariate dependence can be reduced through factor models. Numerical studies show that it has nice finite-sample performance in terms of both model selection and out-of-sample prediction. Finally, we present a novel method for estimating higher moments of multivariate elliptical distributions. Existing estimators typically require a good estimate of the precision matrix, which assumes strict structural assumptions on the covariance or the precision matrix when data is high dimensional. We propose two methods that only involve estimating the covariance matrix. As a by-product we propose a new index for financial returns. Theoretical results, as well as experiments with financial data, reveal the efficacy of our estimators.
URI:	http://arks.princeton.edu/ark:/88435/dsp01ht24wn13d
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Bose_princeton_0181D_12527.pdf		3.2 MB	Adobe PDF	View/Download

Show full item record

Search

Browse