Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01t435gh21h
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Arora, Sanjeev | |
dc.contributor.author | Saunshi, Nikunj Umesh | |
dc.contributor.other | Computer Science Department | |
dc.date.accessioned | 2022-10-10T19:53:01Z | - |
dc.date.available | 2022-10-10T19:53:01Z | - |
dc.date.created | 2022-01-01 | |
dc.date.issued | 2022 | |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp01t435gh21h | - |
dc.description.abstract | While supervised learning sparked the deep learning boom, it has some critical shortcomings: (1) it requires an abundance of expensive labeled data, and (2) it solves tasks from scratch rather than the human-like approach of leveraging knowledge and skills acquired from prior experiences. Pre-training has emerged as an alternative and effective paradigm, to overcome these shortcomings, whereby a model is first trained using easily acquirable data, and later used to solve downstream tasks of interest with much fewer labeled data than supervised learning. Pre-training using unlabeled data, a.k.a. self-supervised learning, has been especially revolutionary, with successes in diverse domains: text, vision, speech, etc. This raises an interesting and challenging question: why should pre-training on unlabeled data help with seemingly unrelated downstream tasks? In this thesis we present works that initiate and build a theoretical framework to study why self-supervised learning is beneficial for downstream tasks. The framework is applied to methods like contrastive learning, auto-regressive language modeling and self-prediction based methods. Central to the framework is the idea that pre-training helps learn low-dimensional representations of data, that subsequently help solve downstream tasks of interest with linear classifiers, requiring fewer labeled data. A common theme is to formalize what are desirable properties of the unlabeled data distribution that is used to construct the self-supervised learning task. Under appropriate formalizations, it can be shown that approximately minimizing the right pre-training objectives can extract the downstream signal that is implicitly encoded in the unlabeled data distribution. Finally it is shown that this signal can be decoded from the learned representations using linear classifiers, thus providing a formalization for transference of “skills and knowledge” across tasks. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | Princeton, NJ : Princeton University | |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a> | |
dc.subject | Artificial intelligence | |
dc.subject | contrastive learning | |
dc.subject | language models | |
dc.subject | Machine learning | |
dc.subject | Representation learning | |
dc.subject | self-supervised learning | |
dc.subject.classification | Artificial intelligence | |
dc.title | Towards Understanding Self-Supervised Representation Learning | |
dc.type | Academic dissertations (Ph.D.) | |
pu.date.classyear | 2022 | |
pu.department | Computer Science | |
Appears in Collections: | Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Saunshi_princeton_0181D_14307.pdf | 5.64 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.