Skip navigation
Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorArora, Sanjeev
dc.contributor.authorSaunshi, Nikunj Umesh
dc.contributor.otherComputer Science Department
dc.description.abstractWhile supervised learning sparked the deep learning boom, it has some critical shortcomings: (1) it requires an abundance of expensive labeled data, and (2) it solves tasks from scratch rather than the human-like approach of leveraging knowledge and skills acquired from prior experiences. Pre-training has emerged as an alternative and effective paradigm, to overcome these shortcomings, whereby a model is first trained using easily acquirable data, and later used to solve downstream tasks of interest with much fewer labeled data than supervised learning. Pre-training using unlabeled data, a.k.a. self-supervised learning, has been especially revolutionary, with successes in diverse domains: text, vision, speech, etc. This raises an interesting and challenging question: why should pre-training on unlabeled data help with seemingly unrelated downstream tasks? In this thesis we present works that initiate and build a theoretical framework to study why self-supervised learning is beneficial for downstream tasks. The framework is applied to methods like contrastive learning, auto-regressive language modeling and self-prediction based methods. Central to the framework is the idea that pre-training helps learn low-dimensional representations of data, that subsequently help solve downstream tasks of interest with linear classifiers, requiring fewer labeled data. A common theme is to formalize what are desirable properties of the unlabeled data distribution that is used to construct the self-supervised learning task. Under appropriate formalizations, it can be shown that approximately minimizing the right pre-training objectives can extract the downstream signal that is implicitly encoded in the unlabeled data distribution. Finally it is shown that this signal can be decoded from the learned representations using linear classifiers, thus providing a formalization for transference of “skills and knowledge” across tasks.
dc.publisherPrinceton, NJ : Princeton University
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=></a>
dc.subjectArtificial intelligence
dc.subjectcontrastive learning
dc.subjectlanguage models
dc.subjectMachine learning
dc.subjectRepresentation learning
dc.subjectself-supervised learning
dc.subject.classificationArtificial intelligence
dc.titleTowards Understanding Self-Supervised Representation Learning
dc.typeAcademic dissertations (Ph.D.)
pu.departmentComputer Science
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Saunshi_princeton_0181D_14307.pdf5.64 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.