Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp0105741v893
Title: Random Ensemble Conservative Q-Learning for Offline Reinforcement Learning
Authors: Grajeda, Gabriel
Advisors: Narasimhan, Karthik
Department: Computer Science
Class Year: 2022
Abstract: Learning effective policies from large-scale datasets without further exploration remains a major challenge when applying reinforcement learning to real-world problems. Offline reinforcement learning offers a promising approach but faces two key difficulties: counterfactual estimation and distributional shift. This thesis proposes the random ensemble conservative Q-learning (RECQL) algorithm to address both issues, building off of the REM and CQL algorithms. Theoretically, we prove the convergence of a general class of random ensemble methods when the state-action space is finite. Empirically, we show that RECQL outperforms existing algorithms on datasets consisting of random transitions but not on datasets with effective behavior. These results suggest that real-world applications of offline reinforcement learning benefit more from conservatism than from robust counterfactual estimation.
URI: http://arks.princeton.edu/ark:/88435/dsp0105741v893
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2024

Files in This Item:
File Description SizeFormat 
GRAJEDA-GABRIEL-THESIS.pdf3.36 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.