Robust Dependence-Adjusted Methods for High Dimensional Data

Bose, Koushiki

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01ht24wn13d

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fan, Jianqing	-
dc.contributor.author	Bose, Koushiki	-
dc.contributor.other	Operations Research and Financial Engineering Department	-
dc.date.accessioned	2018-06-12T17:42:12Z	-
dc.date.available	2018-06-12T17:42:12Z	-
dc.date.issued	2018	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01ht24wn13d	-
dc.description.abstract	The focus of this dissertation is the development, implementation and verification of robust methods for high dimensional heavy-tailed data, with an emphasis on underlying dependence-adjustment through factor models. First, we prove a nonasymptotic version of the Bahadur representation for a Huber loss M-estimator in the presence of heavy-tailed errors. Consequently, we prove a number of important normal approximation results, including the Berry-Esseen bound and Cramér-type moderate deviation. This theory is used to analyze a covariate-adjusted multiple testing procedure under moderately heavy-tailed errors. We prove that the procedure asymptotically controls the overall false discovery proportion at the nominal level. Next, we present the development of an R package that conducts factor-adjusted robust multiple testing of mean effects, even where the factors are unobservable or partially observable. Experiments on real and simulated datasets demonstrate the superior performance of our package. Applying this testing procedure to RNA-Seq data from autism patients, we find new evidence for the etiology of the disease and novel pathways that may be changed in autism. Many of the candidate genes found are responsible for functions affected by autism, or implicated in autism comorbidities like seizures and epilepsy. We observe differences between functions of genes implicated in male and female patients: promising results since autism is a heavily gender-biased disease. Next, we present an R package that performs large-scale model selection for high dimensional sparse regression in the presence of correlated covariates. The software implements a consistent model selection strategy when the covariate dependence can be reduced through factor models. Numerical studies show that it has nice finite-sample performance in terms of both model selection and out-of-sample prediction. Finally, we present a novel method for estimating higher moments of multivariate elliptical distributions. Existing estimators typically require a good estimate of the precision matrix, which assumes strict structural assumptions on the covariance or the precision matrix when data is high dimensional. We propose two methods that only involve estimating the covariance matrix. As a by-product we propose a new index for financial returns. Theoretical results, as well as experiments with financial data, reveal the efficacy of our estimators.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	Dependence Adjustment	-
dc.subject	Factor Models	-
dc.subject	High Dimensional Data	-
dc.subject	Robust Estimation	-
dc.subject	R package	-
dc.subject.classification	Statistics	-
dc.title	Robust Dependence-Adjusted Methods for High Dimensional Data	-
dc.type	Academic dissertations (Ph.D.)	-
pu.projectgrantnumber	690-2143	-
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Bose_princeton_0181D_12527.pdf		3.2 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse