Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01n870zv08h
Title: Attention Perturbed: Finite-Width Analysis of Attention-Based Deep Neural Network Architectures at Initialization
Authors: Curto Correia Contente, Edoardo Miguel
Advisors: Hanin, Boris
Bialek, William
Department: Physics
Class Year: 2023
Abstract: In this work we extend the methods of an “effective” Theory of Deep Learning to investigate the leading order behavior and implicit bias of large-scale language models relying on “attention” mechanisms, at initialization. Due to the algebraic complexity inherent to these attention-based, block architectures we introduce first-handedly a succinct and universal blueprint for a neural network’s architecture, initialization and training/ testing schedule, which can be repurposed for any modern neural network. Utilizing our modular blueprint as a guide, we then perform a series of perturbative analyses of versions of the Transformer Encoder model. We obtain a promising path to a full recursive power expansions in the model’s embedding dimension 1 d of the expected behavior of the output of the attention layer, at initialization. We also introduce the concept of the network’s Neural Tangent Kernel, at initialization. Our goal is to provide an initial bridge from the recently successful general Theory of Deep Learning to a tangible explanation of the dynamics and high-performance of the specific attention-based residual architecture found in the Transformer, which lays at the root of the recent advancements in Natural Language Processing and Computer Vision. “
URI: http://arks.princeton.edu/ark:/88435/dsp01n870zv08h
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Physics, 1936-2024

Files in This Item:
File Description SizeFormat 
CURTOCORREIACONTENTE-EDOARDOMIGUEL-THESIS.pdf953.67 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.