Skip navigation
Please use this identifier to cite or link to this item:
Title: Towards Efficient and Effective Deep Model-based Reinforcement Learning
Authors: Luo, Yuping
Advisors: Arora, Sanjeev
Contributors: Computer Science Department
Keywords: Deep Learning
Machine Learning
Reinforcement Learning
Subjects: Computer science
Artificial intelligence
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: Recent advances in deep reinforcement learning have demonstrated its great potential for real-world problems. However, two concerns prevent reinforcement learning from being applied: Efficiency and Efficacy. This dissertation studies how to improve the efficiency and efficacy of reinforcement learning by designing deep model-based algorithms. The access to dynamics models empowers the algorithms to plan, which is key to sequential decision making. This dissertation covers four topics: online reinforcement learning, the expressivity of neural networks in deep reinforcement learning, offline reinforcement learning, and safe reinforcement learning. For online reinforcement learning, we present an algorithmic framework with theoretical guarantees by utilizing a lower bound of performance the policy learned in the learned environment can obtain in the real environment. We also empirically verify the efficiency of our proposed method. For expressivity of neural networks in deep reinforcement learning, we prove that in some scenarios, the model-based approaches can require much less representation power to approximate a near-optimal policy than model-free approaches, and empirically show that this can be an issue in simulated robotics environments and a model-based planner can help. For offline reinforcement learning, we devise an algorithm that enables the policy to stay close to the provided expert demonstration set to reduce distribution shift, and we also conduct experiments to demonstrate the efficacy of our methods to improve the success rate for robotic arm manipulation tasks in simulated environments. For safe reinforcement learning, we propose a method that uses the learned dynamics model to certify safe states, and our experiments show that our method can learn a decent policy without a single safety violation during training in a set of simple but challenging tasks, while baseline algorithms have hundreds of safety violations.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Luo_princeton_0181D_14201.pdf4.11 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.