Skip navigation
Please use this identifier to cite or link to this item:
Title: MCMC algorithms for sampling from multimodal and changing distributions
Authors: Lee, Holden
Advisors: Arora, Sanjeev
Contributors: Mathematics Department
Keywords: algorithms
machine learning
Markov chain
Subjects: Mathematics
Applied mathematics
Computer science
Issue Date: 2019
Publisher: Princeton, NJ : Princeton University
Abstract: The problem of sampling from a probability distribution is a fundamental problem in Bayesian statistics and machine learning, with applications throughout the sciences. One common algorithmic framework for solving this problem is Markov Chain Monte Carlo (MCMC). However, a large gap exists between simple settings where MCMC has been proven to work, and complex settings arising in practice. In this thesis, I make progress towards closing this gap, focusing on two hurdles in particular. In Chapter 2, I consider the problem of sampling from *multimodal* distributions. Many distributions arising in practice, from simple mixture models to deep generative models, are multimodal, so any Markov chain which makes local moves will get stuck in one mode. Although a variety of temperature heuristics are used to address this problem, their theoretical guarantees are not well-understood even for simple multimodal distributions. I analyze an algorithm combining Langevin diffusion with *simulated tempering*, a heuristic which speeds up mixing by transitioning between different temperatures of the distribution. I develop a general method to prove mixing time using ``soft decompositions'' of Markov processes, and use it to prove rapid mixing for (polynomial) mixtures of log-concave distributions. In Chapter 3, I address the problem of sampling from the distributions $p_t(x) \propto e^{-\sum_{k=0}^t f_k(x)}$ for each epoch $1\le t\le T$ in an *online* manner, given a sequence of (convex) functions $f_0,\ldots, f_T$. This problem arises in large-scale Bayesian inference (for instance, online logistic regression) where instead of obtaining all the observations at once, one constantly acquires new data, and must continuously update the distribution. All previous results for this problem imply a bound on the number of gradient evaluations at each epoch $t$ that grows at least linearly in $t$. For this problem, I show that a certain variance-reduced SGLD (stochastic gradient Langevin dynamics) algorithm solves the online sampling problem with fixed TV-error $\epsilon$ with an *almost constant* number of gradient evaluations per epoch.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Mathematics

Files in This Item:
File Description SizeFormat 
Lee_princeton_0181D_12948.pdf1.25 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.