Skip navigation
Please use this identifier to cite or link to this item:
Title: High-dimensional Robust Statistical Inference
Authors: Yu, Mengxin
Advisors: Fan, Jianqing
Contributors: Operations Research and Financial Engineering Department
Subjects: Statistics
Issue Date: 2023
Publisher: Princeton, NJ : Princeton University
Abstract: High-dimensional data in various fields, such as genomics, neuroscience, finance, and so on, have presented opportunities and challenges for statistical inference. The data being analyzed may contain particular characteristics, such as a discrete structure for rank, high dependencies between variables, or contamination by heavy-tailed errors, making statistical estimation and inference challenging. This thesis develops novel statistical methods and techniques to address these challenges. In Chapter 1, we propose unified inference frameworks that build efficient confidence intervals for ranks based on the observed data. Specifically, we first show that MLE can achieve the optimal sample complexity under the sparsest possible regime and explicitly quantify the MLE's uncertainty. We provide a general framework to conduct efficient inference of ranks based on the Gaussian multiplier bootstrap and answer various uncertainty quantification issues for ranks. In Chapter 2, we tackle the challenges of testing the adequacy of two popularly used models when high correlations exist among features. We propose the Factor Augmented sparse linear Regression Model (FARM) that admits both the latent factor regression and sparse linear regression as special cases. We provide theoretical guarantees for estimating our model under sub-Gaussian and heavy-tailed noises, respectively. In addition, we also leverage our model as an alternative model to test the adequacy of latent factor regression and the sparse linear regression models. In Chapter 3, we conduct high-dimensional inference for location parameters when heavy-tailed noise exists. We revisit the celebrated Hodges-Lehmann (HL) estimator in both the one- and two-sample problems from a non-asymptotic perspective. Our study develops Berry-Esseen inequality and Cramer type moderate deviation for the HL estimator and builds data-driven confidence intervals via a weighted bootstrap approach. These results allow us to extend the HL estimator to large-scale studies and propose tuning-free and moment-free high-dimensional inference procedures.
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Yu_princeton_0181D_14497.pdf1.26 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.