Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01c534fr72c
 Title: Distributed Multi-agent Multi-armed Bandits Authors: Landgren, Peter Advisors: Leonard, Naomi E Contributors: Mechanical and Aerospace Engineering Department Keywords: Decision-makingDistributed controlMABMulti-agent Multi-armed BanditMulti-armed BanditNetwork analysis and control Subjects: Mechanical engineeringComputer scienceElectrical engineering Issue Date: 2019 Publisher: Princeton, NJ : Princeton University Abstract: Social decision-making is a common feature of both natural and artificial systems. Humans, animals, and machines routinely communicate and observe each other to improve their understanding of a complex world. Additionally, many real-world tasks involve sequential decision-making under uncertainty. Such tasks are inherently subject to the explore-exploit tradeoff, where one must select between options with the highest expected payoffs based on current knowledge (exploitation) and options with less well-known but potentially better outcomes (exploration). In this thesis, we consider distributed social decision-making under uncertainty. Specifically, we develop and utilize the multi-agent multi-armed bandit (MAB) problem to model and study how multiple interacting agents make decisions that balance the explore-exploit tradeoff. we consider several different communication protocols for sharing information between agents. We develop and analyze algorithms that address the multi-agent MAB problem under each protocol. We derive bounds on performance and use the bounds to analyze the influence of network structure, i.e., who is communicating with whom, on decision-making outcomes. We first consider communication through consensus, and derive novel results concerning the performance of cooperative estimation of expected reward. We then use these results to develop, analyze, and prove performance bounds for several algorithms that address the multi-agent MAB problem both with and without constraints on concurrent sampling of arms by multiple agents. Furthermore, we develop a new graph centrality measure, which we call explore-exploit" centrality, that can be used to predict performance of networked agents in an MAB problem with communication through consensus. We demonstrate the utility of this centrality measure, and the performance of the algorithms through numerical simulations and robotic experiments. Next, we consider the multi-agent MAB problem with strictly local communication, and develop a novel partition-based algorithm that uses imitation to improve performance. We analyze this algorithm through performance bounds and simulation results. Finally, we consider application to robotic search for radioactive material in a facility. The search for radioactive material is an inherently noisy process, and can be modeled as an MAB problem. We develop and test a MAB-based algorithmic solution and demonstrate that it enables a robot to find multiple radioactive sources efficiently. URI: http://arks.princeton.edu/ark:/88435/dsp01c534fr72c Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu Type of Material: Academic dissertations (Ph.D.) Language: en Appears in Collections: Mechanical and Aerospace Engineering

Files in This Item:
File Description SizeFormat