Skip navigation
Please use this identifier to cite or link to this item:
Title: Towards Understanding Self-Supervised Representation Learning
Authors: Saunshi, Nikunj Umesh
Advisors: Arora, Sanjeev
Contributors: Computer Science Department
Keywords: Artificial intelligence
contrastive learning
language models
Machine learning
Representation learning
self-supervised learning
Subjects: Artificial intelligence
Issue Date: 2022
Publisher: Princeton, NJ : Princeton University
Abstract: While supervised learning sparked the deep learning boom, it has some critical shortcomings: (1) it requires an abundance of expensive labeled data, and (2) it solves tasks from scratch rather than the human-like approach of leveraging knowledge and skills acquired from prior experiences. Pre-training has emerged as an alternative and effective paradigm, to overcome these shortcomings, whereby a model is first trained using easily acquirable data, and later used to solve downstream tasks of interest with much fewer labeled data than supervised learning. Pre-training using unlabeled data, a.k.a. self-supervised learning, has been especially revolutionary, with successes in diverse domains: text, vision, speech, etc. This raises an interesting and challenging question: why should pre-training on unlabeled data help with seemingly unrelated downstream tasks? In this thesis we present works that initiate and build a theoretical framework to study why self-supervised learning is beneficial for downstream tasks. The framework is applied to methods like contrastive learning, auto-regressive language modeling and self-prediction based methods. Central to the framework is the idea that pre-training helps learn low-dimensional representations of data, that subsequently help solve downstream tasks of interest with linear classifiers, requiring fewer labeled data. A common theme is to formalize what are desirable properties of the unlabeled data distribution that is used to construct the self-supervised learning task. Under appropriate formalizations, it can be shown that approximately minimizing the right pre-training objectives can extract the downstream signal that is implicitly encoded in the unlabeled data distribution. Finally it is shown that this signal can be decoded from the learned representations using linear classifiers, thus providing a formalization for transference of “skills and knowledge” across tasks.
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog:
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Saunshi_princeton_0181D_14307.pdf5.64 MBAdobe PDFView/Download

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.