Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01w0892d97x
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFellbaum, Christiane
dc.contributor.authorZou, Anne
dc.date.accessioned2020-10-01T21:26:28Z-
dc.date.available2020-10-01T21:26:28Z-
dc.date.created2020-05-03
dc.date.issued2020-10-01-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01w0892d97x-
dc.description.abstractWith the growth in social media platforms, microblogs have become an important data source for sentiment analysis. Because sentiment analysis systems depend on quality annotated corpora, transfer learning techniques can be valuable. To repurpose models between different languages, current methods employ parallel resources, such as machine translation or bilingual sentiment lexicons. However, these resources are quite scarce. In this paper, we use monolingual resources and unsupervised techniques to induce cross-lingual task-specific word embeddings for the tasks of emoji prediction and sentiment classification of microblog posts from Twitter and Sina Weibo (commonly shortened to Weibo). Unlike the majority of related multilingual work, we do not use English as the source language. We leverage an enormous Mandarin Chinese language data set to train a monolingual model for emoji prediction, extract its trained embedding layer, and adapt it to support the English language. We apply the adapted word embeddings to new cross-lingual English models in the same task as well as a related task, to gauge their transfer potential. Our cross-lingual English models are competitive with monolingual models, achieving 11.8% accuracy at emoji prediction (out of 64 emojis) and 73.2% at binary sentiment classification. Despite the linguistic distance between Chinese and English, our results show strong transfer performance, supporting the assumption that the languages' embedding spaces are similar in topology. We use this assumption to estimate emotional meanings to unique, Weibo-specific emojis without straightforward English translations. Our analyses also reveal that increased diversity in emoji labels in the Chinese emoji prediction pre-training resulted in improved sentiment classification.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleLearning cross-lingual word embeddings for sentiment analysis of microblog posts
dc.typePrinceton University Senior Theses
pu.date.classyear2020
pu.departmentComputer Science
pu.pdf.coverpageSeniorThesisCoverPage
pu.contributor.authorid961250262
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File Description SizeFormat 
ZOU-ANNE-THESIS.pdf3.21 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.