Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01tx31qm79q
Title: An experimental study of the relationship between generalization and the bias variance tradeoff in neural networks
Authors: Leung, Kathryn
Advisors: Hanin, Boris
Department: Operations Research and Financial Engineering
Certificate Program: Applications of Computing Program
Center for Statistics and Machine Learning
Class Year: 2021
Abstract: Machine learning models have proven to be very useful for several different applications and fields of research, but their inner workings are often misunderstood. The complexity of a machine learning model can be loosely defined by how large it is, i.e. how many parameters it uses to make a prediction. A certain number of parameters is necessary for the model to represent complex functions. However, classical machine learning theory posits that increasing the number of parameters past a certain point causes overfitting, a pitfall of machine learning training where the model is too closely fitted to the data it was trained on and lacks the ability to generalize to unseen data. Models that are overfitted this way are called overparameterized. The performance of overparameterized models is often assessed by measuring the test loss, which is closely tied to the bias and variance of the model, as the test loss is the sum of the bias squared and the variance. The idea of a bias variance tradeoff as the complexity of a model grows has permeated machine learning theory, with the bias being monotonically decreasing and the variance being monotonically increasing as the complexity of the model grows. However, recent works have suggested that this classical bias variance tradeoff may not extend to all model classes, particularly overparameterized neural networks. Neural networks are a specific class of machine learning models that have recently gained traction for their extraordinary performance, yet they still remain largely mysterious. They typically have more parameters than are necessary to fit the dataset on which they are trained. According to classical machine learning theory, this would mean that they are unable to generalize well. Yet, overparameterized neural networks have found great success for a wide range of tasks. This ability of overparameterized neural networks to generalize with minimal error has confused researchers. In this thesis, I will conduct experiments on neural networks in various settings, and explore to what extent the classical bias variance tradeoff holds. I will reproduce experiments from other papers on the bias variance tradeoff in neural networks and probe the robustness of past work done in this area. I will also see to what extent the bias variance tradeoff changes for neural networks with different architectures and datasets, and explore theoretical results about the bias variance tradeoff to see if our empirical results are backed by theory. Lastly, I will see if my empirical results can be connected to phenomena in the generalization of overparameterized neural networks, such as the posited double descent curve. In doing so, I hope to take steps towards elucidating the phenomenon of extraordinary generalization performance by overparameterized neural networks.
URI: http://arks.princeton.edu/ark:/88435/dsp01tx31qm79q
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2023

Files in This Item:
File Description SizeFormat 
LEUNG-KATHRYN-THESIS.pdf1.35 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.