Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01g158bm478
Title: Maximizing Interpretability in Credit Scoring through Decision Tree and Ensemble Learning Algorithms
Authors: Anteneh, Aemu
Advisors: Klusowski, Jason
Department: Operations Research and Financial Engineering
Certificate Program: Program in Cognitive Science
Class Year: 2022
Abstract: The tension between interpretability and accuracy is heavily discussed in machine learning due to the huge implications this tradeoff has in a variety of applications. One such application is within credit scoring, specifically the FICO Score due to the black box nature of its algorithm. The lack of transparency of this model makes it susceptible to bias and inconsistencies, but the proprietary nature of FICO makes the existing literature on its implementation sparse. We address this issue by using a Decision Tree model to replicate the FICO Score algorithm in an interpretable way while maintaining good accuracy in credit score classification. This choice of algorithm is motivated by previous success of tree-based methods in determining creditworthiness, as well as the sequential nature of the online myFICO Credit Score Estimator resembling the decision making process of trees. We then use the ensemble learning methods Random Forest and Gradient Boosting to examine what bias may exist in FICO’s algorithm by comparing permutation feature importance scores between two datasets, one that mimics the factors that FICO uses in scoring (our “traditional” dataset) and another that includes that information as well as additional features that are supposedly non diagnostic (our “enhanced” dataset). Across all three models, there is a meaningful difference between the top significant features when the additional variables are included in the dataset, suggesting that the FICO algorithm may be implicitly placing heavy weight on unethical variables in determining credit scores. Furthermore, accuracy of each of our methods using both sub datasets was consistently good. This study gives evidence that a transparent credit scoring model that maintains performance is not only achievable, but is an extremely important step to monitoring bias in assigning credit score classifications.
URI: http://arks.princeton.edu/ark:/88435/dsp01g158bm478
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2024

Files in This Item:
File Description SizeFormat 
ANTENEH-AEMU-THESIS.pdf1.16 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.