Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01x633f4375
Title: Deep Q-Learning and Parallelized Multi-Algorithimic Hyperparameter Optimization for Video Games: An Application to Haxball
Authors: Trestka, Alex
Advisors: Rigobon, Daniel
Department: Operations Research and Financial Engineering
Class Year: 2024
Abstract: Since the rise of Deep Reinforcement Learning, there have been a plethora of applications to many different problems, including performance in video games. Complex neural network structures in combination with reinforcement learning has allowed for the training of agents to reach human level performance in several different settings of Atari video games. However, there has not been thorough investigation performed which evaluates the effects of Hyperparameters on these models, which are crucial components that dictate how well a machine learning model can learn and adapt behavior in an environment. The issue with finding what is considered an “optimal” configuration of hyperparameters is the fact that this is a very computationally complex problem that involves sampling a large range of values and their effects on how well a model can learn a behavior. Traditionally, hyperparameters have been tuned via approaches that are inefficient both in time and scope, as either the process is too long or not enough values are taken into consideration. This paper takes a novel approach to optimizing these Hyperparameters by leveraging computational power alongside complex algorithms in order find an optimal configuration of hyperparameters that ultimately increases agent performance in a video game environment. We utilise parallel optimization to distribute the task across multiple machines, coupled by a Tree-Structured Parzen Estimator in order to sample configurations that show more promise in training a model to adopt a behavior. Additionally, we used a Median Stopping rule, which effectively stops a trial of hyperparameter optimization if it shows no sign of improvement from past trials. We apply this approach in order to train a Deep Q-Network to perform in a custom video game environment meant to simulate Haxball: A two dimensional representation of soccer. We ultimately find that our hyperparameters produced at the end of this study were meaningful. We then used them to train our Deep-Q Network agents and investigated performance in two different variations of our Haxball environment.
URI: http://arks.princeton.edu/ark:/88435/dsp01x633f4375
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2024

Files in This Item:
File Description SizeFormat 
TRESTKA-ALEX-THESIS.pdf3.61 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.