Sequential Decision-Making Problems:
Online Learning for Optimization over
Networks

Zhou, Angela

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp0176537377f

Title:	Sequential Decision-Making Problems: Online Learning for Optimization over Networks
Authors:	Zhou, Angela
Advisors:	Powell, Warren
Contributors:	Bou, Haitham
Department:	Operations Research and Financial Engineering
Class Year:	2016
Abstract:	The granularity of big data allows marketers to target customers more effectively with personalized offers. We model the resulting learning and optimization problem of allocating coupons on a social network where consumer response models are unknown. Uncertainty in these model parameters leads to a natural "exploration-exploitation" tradeoff in choosing which discounts to offer, while choosing which customers to include in a marketing campaign is a stochastic subset selection problem. We adapt the Knowledge Gradient with Discrete Priors from optimal learning and find a statistically significant increase in revenue and general robustness to moderate amounts of sampling noise. We then incorporate social network information about the connectivity of users and optimally learn the underlying customer segment parameters, while accounting for network effects on revenue. Our sampling method under uncertainty is competitive with realizations of the policy which makes the optimal decision with perfect information; however, realized revenue is lower than the expected revenue. We also consider the task assignment problem in crowdsourcing settings, such as on a platform like Amazon's Mechanical Turk, where we are uncertain both of users' reliabilities and the true labels of the questions. We derive an approximation scheme for a dynamic task assignment framework which uses intermediate estimates of the question answers to estimate the increase in mutual information we receive by querying each user. This dynamic assignment framework improves upon the final label accuracy of random sampling by up to 35% for small sample regimes, achieves comparable performance with one-shot dynamic assignment, and better performance when the ratio of questions to users increases. These models and results suggest a general framework using ideas from optimal learning to learn parameters of customer response models while optimizing for revenue. Given these estimates, for submodular objective functions, greedy approximation schemes suffice to construct cardinality-constrained subsets of users to target.
Extent:	137 pages
URI:	http://arks.princeton.edu/ark:/88435/dsp0176537377f
Type of Material:	Princeton University Senior Theses
Language:	en_US
Appears in Collections:	Operations Research and Financial Engineering, 2000-2023

Files in This Item:

File	Size	Format
Zhou_Angela_final_thesis.pdf	1.58 MB	Adobe PDF	Request a copy

Show full item record

Search

Browse