Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01d217qs70q
Title: Model Calibration and Data Augmentation for Predicting Personal Loan Performance
Authors: Huang, Qing
Advisors: Fan, Jianqing
Department: Operations Research and Financial Engineering
Certificate Program: Center for Statistics and Machine Learning
Class Year: 2022
Abstract: The global loan market is sized at over $6 trillion. In the U.S., the total consumer debt outstanding has reached over a staggering $14 trillion, and is only expected to climb in the upcoming years. Consumer loans and loan performance are important not only in maintaining consumer demand and cashflows and promoting economic growth, but also reflecting the socioeconomic conditions of our society. Loan repayment or default is also an important problem for lenders such as banks and other financial institutions, as well as investors who invest in securitized products such as mortgages and asset backed securities. The credit and mortgages sector has been further augmented by the inception and growth of peer-to-peer (P2P) lending platforms in recent years. Previous studies have investigated the effectiveness of regression and machine learning methods (eg. random forest, decision tree, SVM, logistic regression, artificial neural networks, convolutional neural networks, etc.) in predicting loan default, with variable results. This paper builds upon previous methods and explores the use of calibration and data augmentation techniques to produce altered datasets for modeling and predicting loan performance. The realm of machine learning continues to grow, and many new tools have yet to be applied to the finance sector, and more specifically the personal loan market. GANs in particular are relatively new to the field of artificial intelligence and have traditionally been applied to areas such as image generation and text-image translation. We evaluate the results of each model and their predictive powers. We seek to identify which methods are best applied to loan data, and work towards identifying a method that is most appropriate for this particular type of anonymized financial data, and that can be generalized over time and instances to be able to predict loan performance and thus maximize profits even given unfamiliar loan instances.
URI: http://arks.princeton.edu/ark:/88435/dsp01d217qs70q
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2023

Files in This Item:
File Description SizeFormat 
HUANG-QING-THESIS.pdf2.28 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.