Skip navigation
Please use this identifier to cite or link to this item:
Title: Mathematical Theory of Neural Network Models for Machine Learning
Authors: Ma, Chao
Advisors: E, Weinan
Contributors: Mathematics Department
Keywords: Function space
Machine learning
Neural network
Subjects: Applied mathematics
Issue Date: 2020
Publisher: Princeton, NJ : Princeton University
Abstract: In contrast to its unprecedented practical success across a wide range of fields, the theoretical understanding of the principles behind the success of deep learning has been a troubling and controversial subject. In this dissertation, we build a systematic framework to study the theoretical issues of neural networks, including their approximation and generalization, as well as the issues associated with the optimization algorithms. We take inspirations from traditional numerical analysis. Since neural networks usually have high-dimensional input, the most important consideration in this study is the issue of the curse of dimensionality. First, we focus on the approximation property of neural networks. For typical neural network models, we build approximation theories by identifying the appropriate function spaces formed by all the functions that can be approximated by these models without the curse of dimensionality. Direct and inverse approximation theorems are proven, which imply that a function can be efficiently approximated by a neural network model if and only if it belongs to the corresponding function space. Second, we deal with the generalization issue. We design parameter norms for neural network models that can bound the Rademacher complexity. This allows us to establish a posteriori estimates of the generalization error. Together with results from the first part, we derive a priori estimates with constants that depend on the norm of the target function instead of the trained model. Third, the optimization algorithms are studied. Deep learning often operates in the over-parametrized regime where the global minimizers of the empirical risk are far from being unique. Therefore the first issue addressed in this part is how different optimization algorithms select the global minimizers differently. Next, we analyze the training dynamics of an important linear model: the random feature model. We demonstrate that even though the generalization error for the true global minimizer may be very large, the process of deterioration happens very slowly during training, leaving a long period of time during which the generalization error is small. Finally, we study the gradient descent dynamics of two-layer neural networks. We prove some global and local convergence results by taking a continuous viewpoint.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Mathematics

Files in This Item:
File Description SizeFormat 
Ma_princeton_0181D_13374.pdf1.76 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.