Skip navigation
Please use this identifier to cite or link to this item:
Title: Generalization of Deep Neural Networks in Supervised Learning, Generative Modeling, and Adaptive Data Analysis
Authors: Zhang, Yi
Advisors: Arora, Sanjeev SA
Contributors: Computer Science Department
Keywords: Deep Learning
Subjects: Artificial intelligence
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: Why can neural nets with a vast number of parameters, trained on small datasets, still accurately classify unseen data? This "generalization mystery" has become a central question in deep learning. Besides the traditional supervised learning setting, the success of deep learning extends to many other regimes where our understanding of generalization behavior is even more elusive. In this thesis, we begin with supervised learning and ultimately aim to shed light on the generalization performance of deep neural nets in generative modeling and adaptive data analysis by presenting novel theoretical frameworks and practical tools. First, we prove a generalization bound for supervised deep neural networks building upon an empirical observation that the inference computations of deep nets trained on real-life datasets are highly resistant to noise. Following an information-theoretic principle that noise stability indicates redundancy and compressibility, we propose a new succinct compression of the trained net, which leads to drastically better generalization estimates. Next, we establish a finite capacity analysis of Generative Adversarial Networks (GANs). Our study gives insights into the limitations of GANs' ability to learn distributions, and we provide empirical evidence that well-known GANs approaches do result in degenerate solutions. Despite the negative results, we proceed to demonstrate a surprising positive use case of GANs: the test performance of deep neural net classifiers can be predicted accurately using synthetic data generated from a GAN model that was trained on the same training set. Finally, we probe the question "has deep learning models overfitted to standard datasets such as ImageNet after years of data reuse?" We provide a simple estimate, Rip Van Winkle's Razor, for measuring overfitting due to data overuse. It relies upon a new notion of the amount of information that would have to be provided to an expert referee who is familiar with the field and relevant math and who has just been woken up after falling asleep at the moment of the creation of the test set (like in the fairy tale). We show this estimate is non-vacuous in many ImageNet models.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Zhang_princeton_0181D_13978.pdf6.3 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.