Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zw12z8183
Title: Learning to Learn Optimally: A Practical Framework for Machine Learning Applications with Finite Time Horizon
Authors: Lee, Donghun
Advisors: Powell, Warren B
Contributors: Computer Science Department
Keywords: Artificial Intelligence
Learning to Learn Optimally
Machine Learning
Meta Learning
Subjects: Computer science
Issue Date: 2019
Publisher: Princeton, NJ : Princeton University
Abstract: Most machine learning algorithms with asymptotic guarantees leave finite time horizon issues such as initialization or tuning open to the end users, to whom the burden may cause undesirable outcome in practice where finite time horizon performance matters. As an inspirational case of the undesirable finite time behavior, we identify the finite time bias in Q-learning algorithm and present a method to alleviate the bias on-the-fly. Motivated by the gap between the asymptotic guarantees and the practical burdens of machine learning, we investigate the problem of learning to learn, defined as the problem of learning how to apply a given machine learning algorithm to solve a given task with a finite time horizon objective function. To address the problem more generally, we develop the framework of \emph{learning to learn optimally} (LTLO), which models the problem of optimal application of a machine learning algorithm to a given task in a finite horizon. We demonstrate the use of the LTLO framework as a modeling tool for a real world problem via an example of learning to learn how to bid in sponsored search auctions. We show the practical benefit of using the LTLO framework as a baseline to construct meta-LQKG+, a knowledge gradient based LTLO algorithm designed to solve online hyperparameter optimization approximately with a few number of trials, and demonstrate the practical sample efficiency of the algorithm. Answering to the need for a robust anytime LTLO algorithm, we develop online regularized knowledge gradient policy, which solves the problem of LTLO with high probability and has a sublinear regret bound.
URI: http://arks.princeton.edu/ark:/88435/dsp01zw12z8183
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
This content is embargoed until 2021-06-10. For more information contact the Mudd Manuscript Library.


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.