Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01w0892d97x
 Title: Learning cross-lingual word embeddings for sentiment analysis of microblog posts Authors: Zou, Anne Advisors: Fellbaum, Christiane Department: Computer Science Class Year: 2020 Abstract: With the growth in social media platforms, microblogs have become an important data source for sentiment analysis. Because sentiment analysis systems depend on quality annotated corpora, transfer learning techniques can be valuable. To repurpose models between different languages, current methods employ parallel resources, such as machine translation or bilingual sentiment lexicons. However, these resources are quite scarce. In this paper, we use monolingual resources and unsupervised techniques to induce cross-lingual task-specific word embeddings for the tasks of emoji prediction and sentiment classification of microblog posts from Twitter and Sina Weibo (commonly shortened to Weibo). Unlike the majority of related multilingual work, we do not use English as the source language. We leverage an enormous Mandarin Chinese language data set to train a monolingual model for emoji prediction, extract its trained embedding layer, and adapt it to support the English language. We apply the adapted word embeddings to new cross-lingual English models in the same task as well as a related task, to gauge their transfer potential. Our cross-lingual English models are competitive with monolingual models, achieving 11.8% accuracy at emoji prediction (out of 64 emojis) and 73.2% at binary sentiment classification. Despite the linguistic distance between Chinese and English, our results show strong transfer performance, supporting the assumption that the languages' embedding spaces are similar in topology. We use this assumption to estimate emotional meanings to unique, Weibo-specific emojis without straightforward English translations. Our analyses also reveal that increased diversity in emoji labels in the Chinese emoji prediction pre-training resulted in improved sentiment classification. URI: http://arks.princeton.edu/ark:/88435/dsp01w0892d97x Type of Material: Princeton University Senior Theses Language: en Appears in Collections: Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat