Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01c534fr81b
 Title: MCMC algorithms for sampling from multimodal and changing distributions Authors: Lee, Holden Advisors: Arora, Sanjeev Contributors: Mathematics Department Keywords: algorithmsmachine learningMarkov chainMCMCprobability Subjects: MathematicsApplied mathematicsComputer science Issue Date: 2019 Publisher: Princeton, NJ : Princeton University Abstract: The problem of sampling from a probability distribution is a fundamental problem in Bayesian statistics and machine learning, with applications throughout the sciences. One common algorithmic framework for solving this problem is Markov Chain Monte Carlo (MCMC). However, a large gap exists between simple settings where MCMC has been proven to work, and complex settings arising in practice. In this thesis, I make progress towards closing this gap, focusing on two hurdles in particular. In Chapter 2, I consider the problem of sampling from *multimodal* distributions. Many distributions arising in practice, from simple mixture models to deep generative models, are multimodal, so any Markov chain which makes local moves will get stuck in one mode. Although a variety of temperature heuristics are used to address this problem, their theoretical guarantees are not well-understood even for simple multimodal distributions. I analyze an algorithm combining Langevin diffusion with *simulated tempering*, a heuristic which speeds up mixing by transitioning between different temperatures of the distribution. I develop a general method to prove mixing time using soft decompositions'' of Markov processes, and use it to prove rapid mixing for (polynomial) mixtures of log-concave distributions. In Chapter 3, I address the problem of sampling from the distributions $p_t(x) \propto e^{-\sum_{k=0}^t f_k(x)}$ for each epoch $1\le t\le T$ in an *online* manner, given a sequence of (convex) functions $f_0,\ldots, f_T$. This problem arises in large-scale Bayesian inference (for instance, online logistic regression) where instead of obtaining all the observations at once, one constantly acquires new data, and must continuously update the distribution. All previous results for this problem imply a bound on the number of gradient evaluations at each epoch $t$ that grows at least linearly in $t$. For this problem, I show that a certain variance-reduced SGLD (stochastic gradient Langevin dynamics) algorithm solves the online sampling problem with fixed TV-error $\epsilon$ with an *almost constant* number of gradient evaluations per epoch. URI: http://arks.princeton.edu/ark:/88435/dsp01c534fr81b Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu Type of Material: Academic dissertations (Ph.D.) Language: en Appears in Collections: Mathematics