Skip navigation
Please use this identifier to cite or link to this item:
Title: Distributed Multi-agent Multi-armed Bandits
Authors: Landgren, Peter
Advisors: Leonard, Naomi E
Contributors: Mechanical and Aerospace Engineering Department
Keywords: Decision-making
Distributed control
Multi-agent Multi-armed Bandit
Multi-armed Bandit
Network analysis and control
Subjects: Mechanical engineering
Computer science
Electrical engineering
Issue Date: 2019
Publisher: Princeton, NJ : Princeton University
Abstract: Social decision-making is a common feature of both natural and artificial systems. Humans, animals, and machines routinely communicate and observe each other to improve their understanding of a complex world. Additionally, many real-world tasks involve sequential decision-making under uncertainty. Such tasks are inherently subject to the explore-exploit tradeoff, where one must select between options with the highest expected payoffs based on current knowledge (exploitation) and options with less well-known but potentially better outcomes (exploration). In this thesis, we consider distributed social decision-making under uncertainty. Specifically, we develop and utilize the multi-agent multi-armed bandit (MAB) problem to model and study how multiple interacting agents make decisions that balance the explore-exploit tradeoff. we consider several different communication protocols for sharing information between agents. We develop and analyze algorithms that address the multi-agent MAB problem under each protocol. We derive bounds on performance and use the bounds to analyze the influence of network structure, i.e., who is communicating with whom, on decision-making outcomes. We first consider communication through consensus, and derive novel results concerning the performance of cooperative estimation of expected reward. We then use these results to develop, analyze, and prove performance bounds for several algorithms that address the multi-agent MAB problem both with and without constraints on concurrent sampling of arms by multiple agents. Furthermore, we develop a new graph centrality measure, which we call ``explore-exploit" centrality, that can be used to predict performance of networked agents in an MAB problem with communication through consensus. We demonstrate the utility of this centrality measure, and the performance of the algorithms through numerical simulations and robotic experiments. Next, we consider the multi-agent MAB problem with strictly local communication, and develop a novel partition-based algorithm that uses imitation to improve performance. We analyze this algorithm through performance bounds and simulation results. Finally, we consider application to robotic search for radioactive material in a facility. The search for radioactive material is an inherently noisy process, and can be modeled as an MAB problem. We develop and test a MAB-based algorithmic solution and demonstrate that it enables a robot to find multiple radioactive sources efficiently.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Mechanical and Aerospace Engineering

Files in This Item:
File Description SizeFormat 
Landgren_princeton_0181D_12820.pdf12.28 MBAdobe PDFView/Download
Landgren_princeton_0181D_408/CoopUCB2_Collisions.mp4123.44 MBUnknownView/Download
Landgren_princeton_0181D_408/CoopUCB2_Directed.mp480.29 MBUnknownView/Download
Landgren_princeton_0181D_408/CoopUCB2_Undirected.mp477.23 MBUnknownView/Download
Landgren_princeton_0181D_408/radUCL_Run1.mp443.89 MBUnknownView/Download
Landgren_princeton_0181D_408/radUCL_Run2.mp473.38 MBUnknownView/Download
Landgren_princeton_0181D_408/radUCL_Run3.mp470.75 MBUnknownView/Download
Landgren_princeton_0181D_408/radUCL_Run4.mp475.98 MBUnknownView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.