Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01b8515r63b
Title: | Algorithmic and architectural implicit biases in deep learning |
Authors: | Jelassi, Samy |
Advisors: | Hanin, Boris |
Contributors: | Operations Research and Financial Engineering Department |
Keywords: | Machine Learning Optimization |
Subjects: | Computer science |
Issue Date: | 2023 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | This thesis focuses on the interaction between the choice of the optimization scheme and modern machine learning models such as neural networks. These models are trained by minimizing some training loss over data samples. Neural networks tend to be highly over-parameterized with far more trainable parameters than the number of training examples. Consequently, the training loss has multiple global minima which differ in their prediction quality. Yet, model and optimizer jointly work towards finding a minimum with high accuracy. This phenomenon is known as the implicit bias of neural networks. The first part of this thesis focuses on the algorithmic implicit bias problem: Given a dataset and a model, what is the best optimizer? This question is proper to modern machine learning where models are over-parametrized and the training loss has multiple global minima. I focused on settings where GD-trained models make poor predictions and showed how commonly used add-ons, such as momentum or adaptivity improve performance on unseen data. In the second part, I addressed the following question: given a dataset and an optimizer, what is the best model? Until recently, practitioners designed a specific architecture for each domain. They injected some domain knowledge into the network to capture the main properties of the data. Recently, Transformers reach SOTA in computer vision and natural language processing. This is quite surprising since they may not embed any domain-specific knowledge. I focused on this puzzle and describe the minima Transformers converge to and how they learn an appropriate inductive bias in a computer vision inspired setting. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01b8515r63b |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Operations Research and Financial Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Jelassi_princeton_0181D_14527.pdf | 20.31 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.