Safe Reinforcement Learning and Constrained Learning for Dynamical Systems

Yang, Tsung-Yen

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01c821gp00d

Title:	Safe Reinforcement Learning and Constrained Learning for Dynamical Systems
Authors:	Yang, Tsung-Yen
Advisors:	Ramadge, Peter
Contributors:	Electrical and Computer Engineering Department
Keywords:	Imitation Learning Natural Language Processing Robotics Safe Reinforcement Learning System Identification
Subjects:	Artificial intelligence Robotics Linguistics
Issue Date:	2022
Publisher:	Princeton, NJ : Princeton University
Abstract:	Designing control policies for autonomous systems such as self-driving cars is complex. To this end, researchers are increasingly using reinforcement learning (RL) to design a policy. However, guaranteeing safe operation during real-world training and deployment is currently an unsolved issue, which is of vital importance for safety-critical systems. In addition, current RL approaches require accurate simulators (models) to learn policies, which is rarely the case in real-world applications. The thesis introduces a safe RL framework that provides safety guarantees and develops a constrained learning approach that learns system dynamics. We develop a safe RL algorithm that optimizes task rewards while satisfying safety constraints. We then consider a variant of safe RL problems when provided with a baseline policy. The baseline policy can arise from demonstration data and may provide useful cues for learning, but it is not guaranteed to satisfy the safety constraints. We propose a policy optimization algorithm to solve this problem. In addition, we apply a safe RL algorithm in the legged locomotion to show its real-world applicability. We propose an algorithm that switches between a safe recovery policy that keeps the robot away from unsafe states, and a learner policy that is optimized to complete the task. We further exploit the knowledge about the system dynamics to determine the switch of the policies. The results suggest that we can learn legged locomotion skills without falling in the real world. We then revisit the assumption of knowing system dynamics and develop a method that performs system identification from observations. Knowing the parameters of the system improves the quality of simulation and hence minimize unexpected behavior of the policy. Finally, while safe RL holds great promise for many applications, current approaches require domain expertise to specify constraints. We thus introduce a new benchmark with constraints specified in free-form text. We develop a model that can interpret and adhere to such textual constraints. We show that the method achieves higher rewards and fewer constraint violations than baselines.
URI:	http://arks.princeton.edu/ark:/88435/dsp01c821gp00d
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Yang_princeton_0181D_14259.pdf		40.05 MB	Adobe PDF	View/Download

Show full item record

Search

Browse