Skip navigation
Please use this identifier to cite or link to this item:
Authors: Long, Jihao
Advisors: E, Weinan
Contributors: Applied and Computational Mathematics Department
Keywords: Distribution mismatch
Optimal control
Reinforcement learning
Subjects: Applied mathematics
Issue Date: 2023
Publisher: Princeton, NJ : Princeton University
Abstract: Reinforcement learning and optimal control are two approaches to solving the decision-making problem for dynamical systems, with a data-driven and model-driven perspective, respectively. Modern applications of these approaches often encounter high-dimensional state and action spaces, making it essential to develop efficient high-dimensional algorithms. This dissertation aims to address this challenge from two perspectives. In the first part, we analyze the sample complexity of reinforcement learning in a general reproducing kernel Hilbert space (RKHS). We focus on a family of Markov decision processes where the reward functions lie in the unit ball of an RKHS and the transition probabilities lie in an arbitrary set. We introduce a quantity called the perturbational complexity by distribution mismatch to describe the complexity of the admissible state-action distribution space in response to a perturbation in the RKHS with a given scale. We show that this quantity provides both the lower bound of the error of all possible algorithms and the upper bound of two specific algorithms for the reinforcement learning problem. Thus, the decay of the perturbational complexity with respect to the given scale measures the difficulty of the reinforcement learning problem. We further provide some concrete examples and discuss whether the perturbational complexity decays fast or not in these examples. In the second part, we introduce an efficient algorithm to learn high-dimensional closed-loop optimal control. This approach is modified from the recently proposed supervised learning based method, which leverage powerful open-loop optimal control solvers to generate training data and neural networks as efficient high-dimensional function approximators to fit the closed-loop optimal control. This approach successfully handles certain high-dimensional optimal control problems but still performs poorly on more challenging problems. One of the crucial reasons for the failure is the so-called distribution mismatch phenomenon brought by the controlled dynamics. In this dissertation, we investigate this phenomenon and propose the initial value problem enhanced sampling method to mitigate this problem. We further demonstrate that the proposed sampling strategy significantly improves the performance on tested control problems, including the classical linear-quadratic regulator, optimal landing problem of a quadrotor and optimal reaching problem of a 7 DoF manipulator.
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Applied and Computational Mathematics

Files in This Item:
File Description SizeFormat 
Long_princeton_0181D_14500.pdf5.02 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.