Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01pr76f626t
Title: The Wisdom of Crowds: A Natural Language Processing Approach to Forecasting Sports Betting Markets Using Social Media Fan Sentiment
Authors: Chen, Peter
Advisors: Carmona, Rene
Department: Operations Research and Financial Engineering
Certificate Program: Center for Statistics and Machine Learning
Class Year: 2019
Abstract: The wisdom of crowds, or the idea that the collective knowledge of a group of people can be regarded as an alternative to expert opinion, has been repeatedly shown to be an effective indicator of sporting outcomes. With NFL betting being the largest sports betting market in the United States and fan sentiment becoming readily available and abundant with the rise of social media platforms such as Reddit, we study the predictive relationship between social media output and NFL outcomes. In particular, we focus on two most popular forms of sports betting on the per game level, wagering which team will win the point spread (WTS), a handicap for the team bookkeepers expect will win the game, and whether the combined score will be above or below the over-under line, a prediction for the total score set by bookkeepers. Popular natural language processing representations of Reddit text including bag-of-words, term frequency inverse document frequency, and out-of-the-box sentiment scoring models as a proxy for public sentiment were shown to be successful regressors in several common machine learning models. Training on games from 2012-2018 seasons, discriminative models (logistic regression and linear support vector machines) using bag-of-words and term frequency inverse document frequency representations and nearest neighbor models using sentiment scoring algorithms (Vader and Afinn) were found to be most successful at this classification task, achieving out-of-sample testing accuracies of up to 54%, well above the 52.4% required to generate a profitable betting strategy. Further attempts at implementing an LSTM neural network have also shown similar success.
URI: http://arks.princeton.edu/ark:/88435/dsp01pr76f626t
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2023

Files in This Item:
File Description SizeFormat 
CHEN-PETER-THESIS.pdf1.67 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.