Gradient Information Analysis of ReLU Networks

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018k71nk95g

Title:	Gradient Information Analysis of ReLU Networks
Authors:	Hou, Charlie
Advisors:	Chen, Yuxin
Department:	Operations Research and Financial Engineering
Class Year:	2019
Abstract:	ResNets are one of the most useful neural network architectures. They have been used to great success, particularly in applications like computer vision. However, they are highly nonlinear and non-convex, which leads to difficulty in trying to train them as well as difficulty in trying to analyze them theoretically. In this paper, we introduce a simple modification to the canonical ResNet block, which has desirable optimization properties. This modified ResNet block is functionally equivalent to the canonical ResNet block, yet we are able to show that in the one-dimensional (weights, inputs, outputs are all 1D), gaussian input, infinite sample case, many critical points are global minima. We are also able to show through experiments that SGD will find the solutions predicted in our analysis. We also analyze the gradient information of a three-layered network, and show experimentally that a simple gradient analysis is not sufficient to characterize the solutions that gradient descent finds.
URI:	http://arks.princeton.edu/ark:/88435/dsp018k71nk95g
Type of Material:	Princeton University Senior Theses
Language:	en
Appears in Collections:	Operations Research and Financial Engineering, 2000-2024

Files in This Item:

File	Description	Size	Format
HOU-CHARLIE-THESIS.pdf		363.08 kB	Adobe PDF	Request a copy

Search

Browse