Skip navigation
Please use this identifier to cite or link to this item:
Title: Primal-Dual Method for Reinforcement Learning and Markov Decision Processes
Authors: Gong, Hao
Advisors: Wang, Mengdi
Contributors: Operations Research and Financial Engineering Department
Keywords: Machine Learning
Markov Decision Process
Primal-Dual Method
Reinforcement Learning
Subjects: Operations research
Computer science
Issue Date: 2021
Publisher: Princeton, NJ : Princeton University
Abstract: In this thesis, we study the application of a class of stochastic primal-dual algorithms to reinforcement learning. The method takes advantage of the min-max formulation of the Bellman equation and jointly learns the optimal value function and the optimal policy through a series of primal-dual updates. We study both the online reinforcement learning problem and the simulation problem, and present several variants of the method and their theoretical guarantees. In Chapter 2, we investigate the Infinite-Horizon Average-Reward Markov Decision Process (AMDP) and propose an online primal-dual algorithm to learn the optimal policy by actively making decisions along a single state trajectory. Our results appear to be the first duality-based value-policy gradient method and regret analysis for infinite-horizon RL. We prove an $\mathcal{O}(\sqrt{T})$ regret upper bound that also depends on an ergodicity parameter of the AMDP, which can be significantly smaller than existing diameter-dependent bounds in some cases. In Chapter 3, we propose a class of utility maximization problems as a generalization of AMDPs that incorporates nonlinearity into the objective. We develop a primal-dual algorithm to solve the proposed problems and show that it achieves superior sample complexity compared to previous duality-based algorithms when the problem reduces to an AMDP. Chapter 4 presents a general primal-dual framework for solving the utility maximization problem with function approximation. We also study a specific class of linear/bilinear models and prove that the adjusted algorithm achieves a comparable convergence rate.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Gong_princeton_0181D_13650.pdf890.44 kBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.