Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019019s2505
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFan, Jianqingen_US
dc.contributor.authorGu, Weijieen_US
dc.contributor.otherOperations Research and Financial Engineering Departmenten_US
dc.date.accessioned2012-11-15T23:54:20Z-
dc.date.available2012-11-15T23:54:20Z-
dc.date.issued2012en_US
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp019019s2505-
dc.description.abstractMultiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any genes are associated with some traits and those tests are correlated. In finance, thousands of correlated tests are performed to see which fund managers have winning ability. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the first part of this work, we propose a new methodology based on principal factor approximation (PFA), which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with a known but arbitrary dependence structure. We derive the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent FDP. This result has important applications in controlling FDR and FDP. Our estimate of FDP compares favorably with Efron (2007)'s approach, as demonstrated by the simulated examples. Our approach is further illustrated by some real data applications. We also propose a factor-adjusted procedure, which is shown in simulation studies to be more powerful than the fixed threshold procedure. In the second part of the work, we further investigate the cases where the covariance matrix of the test statistics is unknown, which are more challenging and of wider applicability. In such cases, the dependence information needs to be estimated before estimating FDP, and the estimation accuracy may greatly affect the convergence result of FDP or even violate its consistency. We first develop requirements for estimates of eigenvalues and eigenvectors of the covariance matrix such that a consistent estimate of FDP can be obtained. We then provide sufficient conditions on the dependence structure for the estimate of FDP to be consistent and suggest that an approximate factor model structure might be a good candidate. We conclude by proposing the Principal Orthogonal complEment Thresholding (POET)-PFA procedure to consistently estimate FDP. The performance of our procedure is evaluated by simulation studies and real data analysis.en_US
dc.language.isoenen_US
dc.publisherPrinceton, NJ : Princeton Universityen_US
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>en_US
dc.subjectapproximate factor modelen_US
dc.subjectcovariance dependenceen_US
dc.subjectfalse discovery proportionen_US
dc.subjecthigh dimensionalityen_US
dc.subjectmultiple hypothesis testingen_US
dc.subject.classificationStatisticsen_US
dc.subject.classificationOperations researchen_US
dc.titleEstimating False Discovery Proportion under Covariance Dependenceen_US
dc.typeAcademic dissertations (Ph.D.)en_US
pu.projectgrantnumber690-2143en_US
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Gu_princeton_0181D_10326.pdf1.79 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.