Random Ensemble Conservative Q-Learning for Offline Reinforcement Learning

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp0105741v893

Title:	Random Ensemble Conservative Q-Learning for Offline Reinforcement Learning
Authors:	Grajeda, Gabriel
Advisors:	Narasimhan, Karthik
Department:	Computer Science
Class Year:	2022
Abstract:	Learning effective policies from large-scale datasets without further exploration remains a major challenge when applying reinforcement learning to real-world problems. Offline reinforcement learning offers a promising approach but faces two key difficulties: counterfactual estimation and distributional shift. This thesis proposes the random ensemble conservative Q-learning (RECQL) algorithm to address both issues, building off of the REM and CQL algorithms. Theoretically, we prove the convergence of a general class of random ensemble methods when the state-action space is finite. Empirically, we show that RECQL outperforms existing algorithms on datasets consisting of random transitions but not on datasets with effective behavior. These results suggest that real-world applications of offline reinforcement learning benefit more from conservatism than from robust counterfactual estimation.
URI:	http://arks.princeton.edu/ark:/88435/dsp0105741v893
Type of Material:	Princeton University Senior Theses
Language:	en
Appears in Collections:	Computer Science, 1987-2024

Files in This Item:

File	Description	Size	Format
GRAJEDA-GABRIEL-THESIS.pdf		3.36 MB	Adobe PDF	Request a copy

Search

Browse