Skip navigation
Please use this identifier to cite or link to this item:
Title: Gradient Information Analysis of ReLU Networks
Authors: Hou, Charlie
Advisors: Chen, Yuxin
Department: Operations Research and Financial Engineering
Class Year: 2019
Abstract: ResNets are one of the most useful neural network architectures. They have been used to great success, particularly in applications like computer vision. However, they are highly nonlinear and non-convex, which leads to difficulty in trying to train them as well as difficulty in trying to analyze them theoretically. In this paper, we introduce a simple modification to the canonical ResNet block, which has desirable optimization properties. This modified ResNet block is functionally equivalent to the canonical ResNet block, yet we are able to show that in the one-dimensional (weights, inputs, outputs are all 1D), gaussian input, infinite sample case, many critical points are global minima. We are also able to show through experiments that SGD will find the solutions predicted in our analysis. We also analyze the gradient information of a three-layered network, and show experimentally that a simple gradient analysis is not sufficient to characterize the solutions that gradient descent finds.
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Operations Research and Financial Engineering, 2000-2023

Files in This Item:
File Description SizeFormat 
HOU-CHARLIE-THESIS.pdf363.08 kBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.