Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp018k71nk79v
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFunkhouser, Thomas A.-
dc.contributor.authorYu, Fu-
dc.contributor.otherComputer Science Department-
dc.date.accessioned2018-04-26T18:48:36Z-
dc.date.available2018-04-26T18:48:36Z-
dc.date.issued2018-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp018k71nk79v-
dc.description.abstractPixel-level prediction generalizes a wide range of computer vision tasks including semantic image segmentation and dense depth prediction. They are fundamental for image recognition, receiving continual attention from the community. However, although they share common traits that may admit a general solution, they are usually studied in isolation because of different domain characteristics. This thesis aims to study the essential problems behind those tasks and shed light on a general framework. This thesis starts with an algorithm that can predict plausible depth from almost identical images based on geometric optimization. The motion between those images is called "Accidental Motion". The analysis of accidental motion shows that motion optimization has special convexity properties. It leads to a reconstruction pipeline that can produce a plausible dense depth map for the reference image, which is shown to enable depth based camera effects. The second part then studies learning pixel representation to predict semantic properties based on the single reference image. Previous works usually use learned upsampling to recover the pixel-level information. This work proposes to use Dilated Convolution to transform the classification networks such that high-resolution prediction is achieved without upsampling. Dilated Convolution can also render an exponential increase in receptive field, which is ideal for learning global context. A context module is proposed based on this property that can improve the network performance significantly and consistently. Dilation is still a standard component in the state-of-the-art method for semantic image segmentation. The further study of dilated residual networks shows that same high-resolution prediction can also improve image classification results. This indicates no essential network architecture difference exists between image classification and segmentation. Further inspection of class activation maps and layer responses uncover peculiar gridding patterns and their cause. This finding leads to new designs of convolutional networks that can remove the gridding artifacts and produce activations with better spatial consistency. The new networks can improve the performance of both image classification and semantic segmentation. The presented method and results may inspire new research in building a unified framework for image recognition of geometry and semantics.-
dc.language.isoen-
dc.publisherPrinceton, NJ : Princeton University-
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a>-
dc.subjectconvolutional networks-
dc.subjectdeep learning-
dc.subjectdepth estimation-
dc.subjectdilated convolution-
dc.subjectsemantic segmentation-
dc.subject.classificationComputer science-
dc.titlePixel-Level Prediction: From Geometry to Semantics-
dc.typeAcademic dissertations (Ph.D.)-
pu.projectgrantnumber690-2143-
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Yu_princeton_0181D_12472.pdf44.82 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.