Please use this identifier to cite or link to this item:
|Gradient Information Analysis of ReLU Networks
|Operations Research and Financial Engineering
|ResNets are one of the most useful neural network architectures. They have been used to great success, particularly in applications like computer vision. However, they are highly nonlinear and non-convex, which leads to difficulty in trying to train them as well as difficulty in trying to analyze them theoretically. In this paper, we introduce a simple modification to the canonical ResNet block, which has desirable optimization properties. This modified ResNet block is functionally equivalent to the canonical ResNet block, yet we are able to show that in the one-dimensional (weights, inputs, outputs are all 1D), gaussian input, infinite sample case, many critical points are global minima. We are also able to show through experiments that SGD will find the solutions predicted in our analysis. We also analyze the gradient information of a three-layered network, and show experimentally that a simple gradient analysis is not sufficient to characterize the solutions that gradient descent finds.
|Type of Material:
|Princeton University Senior Theses
|Appears in Collections:
|Operations Research and Financial Engineering, 2000-2023
Files in This Item:
|Request a copy
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.