AdvTranSafemer: Robust Policy Learning via Transformer and Adversarial Attack

Wu, Xiaorun

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01gx41mn07b

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Jha, Niraj K	-
dc.contributor.author	Wu, Xiaorun	-
dc.date.accessioned	2022-08-16T20:31:10Z	-
dc.date.available	2023-07-03T12:00:08Z	-
dc.date.created	2022-04-25	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01gx41mn07b	-
dc.description.abstract	Traditionally, Reinforcement Learning (RL) has been concerned with developing stationary policies for Robotics safety & planning. One key assumption traditional RL has relied heavily upon is the Markov assumption, such that the distribution of the future state only depends on the current state. However, we may also model the RL problem as a generic sequence modeling problem, where the goal is to produce a sequence of actions that maximizes the designated reward. Currently, the transformer has been at the heated research forefront, and the high success of Transformer in other sequence modeling tasks such as NLP offers promising potential in modeling the safety tasks in RL as well. In this paper, we introduce a novel mechanism for the agent to learn a robust safety policy: our novelty is two-fold: first, we employ a transformer to generate rewards for the agent to have a richer curriculum of learning; second, we introduce adversaries, each with objective function exactly the negative of the agent. The agent and each of the adversaries play a zero-sum game. Through these processes, we hope that the agent would be better off by looking at longer history behind, and by playing against a random adversary, the agent would be able to learn a more robust policy against random disturbances in the environment. In addition, we also employed additional Trust Region techniques, the associated clipped surogative objectives & adaptive KL penalty coefficient, as well as Lyapunov stability verification techniques as additional stabilization tools to accommodate for more complex environments. We tested the efficacy of our design on twelve continuous control tasks. Using a bottom-up approach, we tested the environments using increasingly refined algorithm designs. Our testing results show much greater stability (more than 70% boost), a higher rate of reproducibility (at least 35%), relatively fast convergence (at least a 50% boost), as well as reduced training time.	en_US
dc.format.mimetype	application/pdf
dc.language.iso	en	en_US
dc.title	AdvTranSafemer: Robust Policy Learning via Transformer and Adversarial Attack	en_US
dc.type	Princeton University Senior Theses
pu.embargo.terms	2023-07-01	-
pu.date.classyear	2022	en_US
pu.pdf.coverpage	SeniorThesisCoverPage
pu.contributor.authorid	920208472
pu.certificate	Robotics & Intelligent Systems Program	en_US
pu.mudd.walkin	Yes	en_US
Appears in Collections:	Robotics and Intelligent Systems Program

Files in This Item:

File	Description	Size	Format
WU-XIAORUN-THESIS.pdf		3.18 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse