Pixel-Level Prediction: From Geometry to Semantics

Yu, Fu

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018k71nk79v

Title:	Pixel-Level Prediction: From Geometry to Semantics
Authors:	Yu, Fu
Advisors:	Funkhouser, Thomas A.
Contributors:	Computer Science Department
Keywords:	convolutional networks deep learning depth estimation dilated convolution semantic segmentation
Subjects:	Computer science
Issue Date:	2018
Publisher:	Princeton, NJ : Princeton University
Abstract:	Pixel-level prediction generalizes a wide range of computer vision tasks including semantic image segmentation and dense depth prediction. They are fundamental for image recognition, receiving continual attention from the community. However, although they share common traits that may admit a general solution, they are usually studied in isolation because of different domain characteristics. This thesis aims to study the essential problems behind those tasks and shed light on a general framework. This thesis starts with an algorithm that can predict plausible depth from almost identical images based on geometric optimization. The motion between those images is called "Accidental Motion". The analysis of accidental motion shows that motion optimization has special convexity properties. It leads to a reconstruction pipeline that can produce a plausible dense depth map for the reference image, which is shown to enable depth based camera effects. The second part then studies learning pixel representation to predict semantic properties based on the single reference image. Previous works usually use learned upsampling to recover the pixel-level information. This work proposes to use Dilated Convolution to transform the classification networks such that high-resolution prediction is achieved without upsampling. Dilated Convolution can also render an exponential increase in receptive field, which is ideal for learning global context. A context module is proposed based on this property that can improve the network performance significantly and consistently. Dilation is still a standard component in the state-of-the-art method for semantic image segmentation. The further study of dilated residual networks shows that same high-resolution prediction can also improve image classification results. This indicates no essential network architecture difference exists between image classification and segmentation. Further inspection of class activation maps and layer responses uncover peculiar gridding patterns and their cause. This finding leads to new designs of convolutional networks that can remove the gridding artifacts and produce activations with better spatial consistency. The new networks can improve the performance of both image classification and semantic segmentation. The presented method and results may inspire new research in building a unified framework for image recognition of geometry and semantics.
URI:	http://arks.princeton.edu/ark:/88435/dsp018k71nk79v
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Yu_princeton_0181D_12472.pdf		44.82 MB	Adobe PDF	View/Download

Show full item record

Search

Browse