Learning to Learn Optimally: A Practical Framework for Machine Learning Applications with Finite Time Horizon

Lee, Donghun

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zw12z8183

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Powell, Warren B	-
dc.contributor.author	Lee, Donghun	-
dc.contributor.other	Computer Science Department	-
dc.date.accessioned	2019-12-12T17:21:28Z	-
dc.date.available	2021-12-02T16:21:40Z	-
dc.date.issued	2019	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01zw12z8183	-
dc.description.abstract	Most machine learning algorithms with asymptotic guarantees leave finite time horizon issues such as initialization or tuning open to the end users, to whom the burden may cause undesirable outcome in practice where finite time horizon performance matters. As an inspirational case of the undesirable finite time behavior, we identify the finite time bias in Q-learning algorithm and present a method to alleviate the bias on-the-fly. Motivated by the gap between the asymptotic guarantees and the practical burdens of machine learning, we investigate the problem of learning to learn, defined as the problem of learning how to apply a given machine learning algorithm to solve a given task with a finite time horizon objective function. To address the problem more generally, we develop the framework of \emph{learning to learn optimally} (LTLO), which models the problem of optimal application of a machine learning algorithm to a given task in a finite horizon. We demonstrate the use of the LTLO framework as a modeling tool for a real world problem via an example of learning to learn how to bid in sponsored search auctions. We show the practical benefit of using the LTLO framework as a baseline to construct meta-LQKG+, a knowledge gradient based LTLO algorithm designed to solve online hyperparameter optimization approximately with a few number of trials, and demonstrate the practical sample efficiency of the algorithm. Answering to the need for a robust anytime LTLO algorithm, we develop online regularized knowledge gradient policy, which solves the problem of LTLO with high probability and has a sublinear regret bound.	-
dc.language.iso	en	-
dc.publisher	Princeton, NJ : Princeton University	-
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>	-
dc.subject	Artificial Intelligence	-
dc.subject	Learning to Learn Optimally	-
dc.subject	Machine Learning	-
dc.subject	Meta Learning	-
dc.subject.classification	Computer science	-
dc.title	Learning to Learn Optimally: A Practical Framework for Machine Learning Applications with Finite Time Horizon	-
dc.type	Academic dissertations (Ph.D.)	-
pu.embargo.terms	2021-06-10	-
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Lee_princeton_0181D_12961.pdf		1.77 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse