Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp0105741v893
Title: | Random Ensemble Conservative Q-Learning for Offline Reinforcement Learning |
Authors: | Grajeda, Gabriel |
Advisors: | Narasimhan, Karthik |
Department: | Computer Science |
Class Year: | 2022 |
Abstract: | Learning effective policies from large-scale datasets without further exploration remains a major challenge when applying reinforcement learning to real-world problems. Offline reinforcement learning offers a promising approach but faces two key difficulties: counterfactual estimation and distributional shift. This thesis proposes the random ensemble conservative Q-learning (RECQL) algorithm to address both issues, building off of the REM and CQL algorithms. Theoretically, we prove the convergence of a general class of random ensemble methods when the state-action space is finite. Empirically, we show that RECQL outperforms existing algorithms on datasets consisting of random transitions but not on datasets with effective behavior. These results suggest that real-world applications of offline reinforcement learning benefit more from conservatism than from robust counterfactual estimation. |
URI: | http://arks.princeton.edu/ark:/88435/dsp0105741v893 |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Computer Science, 1987-2024 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
GRAJEDA-GABRIEL-THESIS.pdf | 3.36 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.