Selected Topics in Deep Learning Theory and Continuous-time Hidden Markov Models

Wang, Qingcan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01nv935598t

Title:	Selected Topics in Deep Learning Theory and Continuous-time Hidden Markov Models
Authors:	Wang, Qingcan
Advisors:	E, Weinan
Contributors:	Applied and Computational Mathematics Department
Subjects:	Applied mathematics
Issue Date:	2021
Publisher:	Princeton, NJ : Princeton University
Abstract:	The first part of the thesis proves some theoretical results in deep learning. For the approximation problem, we prove that deep neural networks can approximate analytic functions exponentially fast. The number of parameters needed to achieve an error tolerance of epsilon is O((log 1/epsilon)^d), and it is exponential in the sense that the rate depends only on log 1/epsilon instead of epsilon itself. We also develop a general method to show that the deep networks never have worse approximation properties than shallow ones. For the optimization problem, we analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an epsilon-optimal point in O(L^3 log 1/epsilon) iterations, which scales polynomially with the network depth L. It demonstrates the importance of the residual structure and the initialization in the optimization for deep linear neural networks. The second part focuses on continuous-time hidden Markov models (CT-HMM), where both the hidden states and observations occur in continuous time. We propose a unified framework that formally obtains the model parameter estimation by taking continuous-time limit of the classical discrete-time Baum-Welch algorithm, and recovers and extends several previous results in CT-HMM under different settings. Here two settings are illustrated: hidden jump process with a finite state space, and hidden diffusion process with a continuous state space. For each setting, we first estimate the hidden state given the observations and model parameters, showing that the posterior distribution of the hidden states can be described by differential equations in continuous time. Then we consider the estimation of unknown model parameters, deriving the continuous-time formulas for the expectation-maximization algorithm. We also propose a Monte Carlo method based on the continuous formulation, sampling the posterior distribution of the hidden states and updating the parameter estimation.
URI:	http://arks.princeton.edu/ark:/88435/dsp01nv935598t
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Applied and Computational Mathematics

Files in This Item:

File	Description	Size	Format
Wang_princeton_0181D_13713.pdf		1.2 MB	Adobe PDF	View/Download

Show full item record

Search

Browse