Statistical and Machine Learning Methods For Financial Data

Lu, Kun

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01dz010t14p

Title:	Statistical and Machine Learning Methods For Financial Data
Authors:	Lu, Kun
Advisors:	Mulvey, John
Contributors:	Operations Research and Financial Engineering Department
Subjects:	Statistics
Issue Date:	2021
Publisher:	Princeton, NJ : Princeton University
Abstract:	This dissertation focus on developing new statistical and machine learning methods for financial applications. We first propose a new model named Features Augmented Hidden Markov Model (FAHMM), which extends the the traditional Hidden Markov Model (HMM) by including the features structure. We also allow the model to be very general from two perspectives: 1. the emission distribution can be of different form (eg. exponential family); 2. we also deal with different features structures (e.g. high dimensionality, multi-colinearity) by adding different penalization terms. Theoretical proof of convergence, simulation and an empirical application to currency regime identification are provided. Next, we develop a new neural Natural Language Processing model, which combines the reinforcement learning model with Bidirectional Encoder Representations from Transformers (BERT) model to deal with the long documents classification. Due to the limitation of BERT allowing only 512 tokens, it cannot deal with long documents, which is very common in financial data (e.g. financial news, earnings transcript), we train reinforcement learning model together with the BERT based model end to end: using policy-gradient reinforcement learning to do sentences/chunks selection. Then we apply our model to earnings conference call transcripts data and predict the stock price movement after the call. Finally, we work on a method to estimate the high dimensional covariance matrix using high frequency data. We use factor structure and thresholding methods to deal with high dimensionality, and using pre-average and refresh time to tackle high frequency data specialty: microstructure noise and non-synchronicity. We also consider three different scenarios, when we only know factors, or only know loadings, or know neither. Theoretical proof and simulation are provided to support the theory, and a horse race on the out-of-sample portfolio allocation with Dow Jones 30, S&P 100, and S&P 500 index constituents, respectively are also conducted.
URI:	http://arks.princeton.edu/ark:/88435/dsp01dz010t14p
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Lu_princeton_0181D_13623.pdf		3.39 MB	Adobe PDF	View/Download

Show full item record

Search

Browse