Skip navigation
Please use this identifier to cite or link to this item:
Title: Decision-Making with Non-Markovian Rewards: Multi-Agent Learning and Applications to Medical Diagnostics
Authors: Yu, Zheng
Advisors: Wang, Mengdi M.W.
Contributors: Operations Research and Financial Engineering Department
Keywords: non-markovian reward
reinforcement learning
Subjects: Artificial intelligence
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: In this thesis, we consider the decision making problems with non-Markovian rewards and its application to medical diagnostics. Despite the elegance of reinforcement learning in modeling practical sequential decision problems, a series of non-Markovian reward decision making problems cannot be directly captured by the classical Markov decision process model. Such tasks invalidates standard reinforcement learning method and requires further study. We first consider the task of efficient medical diagnostics. The goal is to give accurate medical diagnosis in terms of $F_1$ score while minimizing the costs of labtests. In this work, we use reinforcement learning to find a dynamic policy that selects labtest panels sequentially based on previous observations. However, $F_1$ score cannot be written as a cumulative sum of rewards. Thus, we develop a reward shaping approach, leveraging properties of $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. In the second part, we study a non-Markovian reward RL setting with multiply agents involved. Each individual agents make decisions on disjoint subsets (blocks) of the state space and have private interests (reward functions), while the entire team aims to maximize a general long-term team utility function. By leveraging the inherent duality of policy optimization, we propose a min-max multi-block policy optimization framework to decompose the overall problem into individual local tasks, which enables a federated teamwork mechanism where a team lead coordinates individual agents via reward shaping, and each agent solves her local task defined only on her local state subset. Lastly, we consider the task of finding the reduced-dimensional structure in complex networks while the network is not explicitly observed. We focus on the online factorization and partition of implicit large lumpable networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem to learn a low-dimensional representation for each vertex. We further apply clustering techniques to recover the network partition. We provably show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability given sufficient data.
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Yu_princeton_0181D_14349.pdf3.89 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.