Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zs25xc66s
Title: Applications of Machine Learning and Natural Language Processing on WallStreetBets Reddit Data to Predict GameStop Stock Prices
Authors: Huynh, Daniel
Advisors: Li, Xiaoyan
Department: Computer Science
Class Year: 2022
Abstract: Many people believed that Reddit's WallStreetBets community was a driving factor in the GameStop short squeeze of January 2021, with some even accusing the subreddit community of market manipulation. The goal of this project was to apply machine learning (ML) and natural language processing (NLP) techniques in order to have a means to quantify WallStreetBets's influence over GameStop stock (GME) and gauge the validity of these claims. Before each open market day, a thread titled "What are your moves tomorrow, [insert tomorrow's date]?" is posted on WallStreetBets by a moderator, and then the members comment what stock trades they plan to make and react to each other's comments. This project involved the data collection and sentiment analysis of the comments from these daily "moves" threads, specifically the threads posted from the start of January 2021 to the end of February 2021 in order to capture the time period leading up to and after the GameStop short squeeze. With the metadata and sentiment data of all Reddit comments discussing GME, I performed Granger causality tests to see if there were any causality relationships between the Reddit data and GME's price. Lastly, the Reddit data was used to train a machine learning model to predict GME's price. I found that there did not exist a causality relationship between the sentiment data and GME's price. However, there did exist one between the total number of comments posted on the daily thread and GME's price. Additionally, the machine learning model trained with the Reddit data was able to predict whether GME's price would go up or down the next day with 100% accuracy over a sample size of open market days. Furthermore, this model outperformed its clone that had the same parameters but was trained without the Reddit data. These findings suggest that WallStreetBets actually did have a significant impact on the stock market. However, it is still presumptuous to call them "market manipulators" and more work would have to be done to determine just how strong their influence over the market truly is.
URI: http://arks.princeton.edu/ark:/88435/dsp01zs25xc66s
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2024

Files in This Item:
File Description SizeFormat 
HUYNH-DANIEL-THESIS.pdf1.11 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.