Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019g54xm94g
Title: Beyond Language Barriers: Extending Semantic Supervision to Multiple Languages
Authors: Lin, Mandy
Advisors: Narasimhan, Karthik
Department: Computer Science
Certificate Program: Center for Statistics and Machine Learning
Class Year: 2023
Abstract: Multilingual language models address data scarcity issues when training models to perform tasks in low-resource languages. Additionally, in prior work, SEMSUP leverages information in class descriptions to improve Zero-Shot classification performance. Zero-Shot classification is the task of predicting output classes unseen by the model during training. While SEMSUP achieves strong results when completing generalized classification tasks in English, its robustness in cross-lingual classification tasks is unknown. In this thesis, we present Multilingual-SEMSUP, a version of SEMSUP that is capable of generalizing to unseen outputs and across languages. We train and evaluate this model on different language combinations, utilizing English, Spanish, Chinese, Russian, and Arabic class descriptions. Our experiments across three generalized classification scenarios using a textual and an image dataset indicate that Multilingual-SEMSUP performs strongly in cross-lingual generalization. We also investigate Multilingual-SEMSUP's predictions against target labels, concluding that its incorrect predictions are reasonable errors.
URI: http://arks.princeton.edu/ark:/88435/dsp019g54xm94g
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
LIN-MANDY-THESIS.pdf13.63 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.