Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01ks65hg462
Title: Mathematical Theory of Machine Learning Models for Estimating Probability Distributions
Authors: Yang, Hongkang
Advisors: E, Weinan
Contributors: Applied and Computational Mathematics Department
Keywords: Curse of dimensionality
Density estimation
Generalization error
Generative modeling
Implicit regularization
Memorization
Subjects: Applied mathematics
Statistics
Computer science
Issue Date: 2023
Publisher: Princeton, NJ : Princeton University
Abstract: The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Meanwhile, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper aims at solving these problems. First, we provide a mathematical framework from which all the well-known models can be derived. The main factor is whether to use reweighting or transport for the modeling and evaluation of distributions. It leads to different choices of distribution representations and loss types, and their combinations give rise to the diversity of distribution learning models. Beyond a categorization, this perspective greatly facilitates our analysis of training and generalization, such that our proof techniques become applicable to broad categories of models instead of particular instances. Second, we resolve the aforementioned paradox by showing that both generalization and memorization will take place, but over different time scales. On one hand, the models satisfy the property of universal convergence, so their concentration onto the empirical distribution is inevitable in the long term. On the other hand, these models enjoy implicit regularization during training, so that their generalization errors at early-stopping escape from the curse of dimensionality. Third, we obtain comprehensive results on the training behavior of distribution learning models. For models with either the potential representation or fixed generator representation, we establish global convergence to the target distributions. For models with the free generator representation, we show that they all possess a large family of spurious critical points, which sheds light on the training difficulty of these models. Furthermore, we uncover the mechanisms underlying the mode collapse phenomenon that disrupts the training of the generative adversarial networks.
URI: http://arks.princeton.edu/ark:/88435/dsp01ks65hg462
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Applied and Computational Mathematics

Files in This Item:
File Description SizeFormat 
Yang_princeton_0181D_14525.pdf1.29 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.