Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01b8515r63b
Title: Algorithmic and architectural implicit biases in deep learning
Authors: Jelassi, Samy
Advisors: Hanin, Boris
Contributors: Operations Research and Financial Engineering Department
Keywords: Machine Learning
Optimization
Subjects: Computer science
Issue Date: 2023
Publisher: Princeton, NJ : Princeton University
Abstract: This thesis focuses on the interaction between the choice of the optimization scheme and modern machine learning models such as neural networks. These models are trained by minimizing some training loss over data samples. Neural networks tend to be highly over-parameterized with far more trainable parameters than the number of training examples. Consequently, the training loss has multiple global minima which differ in their prediction quality. Yet, model and optimizer jointly work towards finding a minimum with high accuracy. This phenomenon is known as the implicit bias of neural networks. The first part of this thesis focuses on the algorithmic implicit bias problem: Given a dataset and a model, what is the best optimizer? This question is proper to modern machine learning where models are over-parametrized and the training loss has multiple global minima. I focused on settings where GD-trained models make poor predictions and showed how commonly used add-ons, such as momentum or adaptivity improve performance on unseen data. In the second part, I addressed the following question: given a dataset and an optimizer, what is the best model? Until recently, practitioners designed a specific architecture for each domain. They injected some domain knowledge into the network to capture the main properties of the data. Recently, Transformers reach SOTA in computer vision and natural language processing. This is quite surprising since they may not embed any domain-specific knowledge. I focused on this puzzle and describe the minima Transformers converge to and how they learn an appropriate inductive bias in a computer vision inspired setting.
URI: http://arks.princeton.edu/ark:/88435/dsp01b8515r63b
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Jelassi_princeton_0181D_14527.pdf20.31 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.