Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp012j62s7948
 Title: Primal-Dual Method for Reinforcement Learning and Markov Decision Processes Authors: Gong, Hao Advisors: Wang, Mengdi Contributors: Operations Research and Financial Engineering Department Keywords: Machine LearningMarkov Decision ProcessPrimal-Dual MethodReinforcement Learning Subjects: Operations researchComputer science Issue Date: 2021 Publisher: Princeton, NJ : Princeton University Abstract: In this thesis, we study the application of a class of stochastic primal-dual algorithms to reinforcement learning. The method takes advantage of the min-max formulation of the Bellman equation and jointly learns the optimal value function and the optimal policy through a series of primal-dual updates. We study both the online reinforcement learning problem and the simulation problem, and present several variants of the method and their theoretical guarantees. In Chapter 2, we investigate the Infinite-Horizon Average-Reward Markov Decision Process (AMDP) and propose an online primal-dual algorithm to learn the optimal policy by actively making decisions along a single state trajectory. Our results appear to be the first duality-based value-policy gradient method and regret analysis for infinite-horizon RL. We prove an $\mathcal{O}(\sqrt{T})$ regret upper bound that also depends on an ergodicity parameter of the AMDP, which can be significantly smaller than existing diameter-dependent bounds in some cases. In Chapter 3, we propose a class of utility maximization problems as a generalization of AMDPs that incorporates nonlinearity into the objective. We develop a primal-dual algorithm to solve the proposed problems and show that it achieves superior sample complexity compared to previous duality-based algorithms when the problem reduces to an AMDP. Chapter 4 presents a general primal-dual framework for solving the utility maximization problem with function approximation. We also study a specific class of linear/bilinear models and prove that the adjusted algorithm achieves a comparable convergence rate. URI: http://arks.princeton.edu/ark:/88435/dsp012j62s7948 Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu Type of Material: Academic dissertations (Ph.D.) Language: en Appears in Collections: Operations Research and Financial Engineering