Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp016395wb452
Title: NLP.GOV: Identifying Market Opportunities Through Natural Language Processing on Government Bills
Authors: Mehta, Aditya
Advisors: Heide, Felix
Department: Computer Science
Certificate Program: Finance Program
Class Year: 2024
Abstract: Throughout this paper, we investigate the impact of congressional bills on the U.S. stock market and evaluate the performance of large language models' (LLMs) abilities to identify and understand these potential economic influences. We note that legislative acts, such as the CHIPS Act, can significantly influence market dynamics, as evidenced by the over 50% growth in the semiconductor industry post-legislation, using SOXX as a tracker of the semiconductor industry's growth. To this end, we explore the potential of using natural language processing (NLP) techniques to analyze the content of government bills. Our objective is to discern which segments within these documents may signal certain sectors' growth within the S&P 500 index over a long horizon. To do this, we first classify the bills into the 11 S&P sectors using a confidence score, which we use to create a sector-focused portfolio using an adjustment to a portfolio of the S&P 500's predicted sector weights. We then compare this portfolio's performance to a baseline investment in the S&P 500 over an extended period of time. We find that this portfolio tends to be more volatile than an initial investment of the S&P 500 with only a small gain on the mean returns. We then look to congressional trading patterns in the days before and after the bill is passed in order to understand whether or not these bills are being traded on, and compare the Senate-traded portfolio with the model's portfolio. Lastly, we propose our own transformer-convolution integrative framework (TCIF) using both the model's predictions as well as Senate trading patterns in order to make long-term predictions. We find that these TCIF models slightly outperform the market and are able to identify long term trends in the market, as evidenced by higher alpha, Sharpe ratio, and Sortino ratios. Thus, we demonstrate an ability to generate slight alpha over the S&P 500 as a whole.
URI: http://arks.princeton.edu/ark:/88435/dsp016395wb452
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2024

Files in This Item:
File Description SizeFormat 
MEHTA-ADITYA-THESIS.pdf1.43 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.