Learning cross-lingual word embeddings for sentiment analysis of microblog posts

Zou, Anne

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01w0892d97x

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fellbaum, Christiane
dc.contributor.author	Zou, Anne
dc.date.accessioned	2020-10-01T21:26:28Z	-
dc.date.available	2020-10-01T21:26:28Z	-
dc.date.created	2020-05-03
dc.date.issued	2020-10-01	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01w0892d97x	-
dc.description.abstract	With the growth in social media platforms, microblogs have become an important data source for sentiment analysis. Because sentiment analysis systems depend on quality annotated corpora, transfer learning techniques can be valuable. To repurpose models between different languages, current methods employ parallel resources, such as machine translation or bilingual sentiment lexicons. However, these resources are quite scarce. In this paper, we use monolingual resources and unsupervised techniques to induce cross-lingual task-specific word embeddings for the tasks of emoji prediction and sentiment classification of microblog posts from Twitter and Sina Weibo (commonly shortened to Weibo). Unlike the majority of related multilingual work, we do not use English as the source language. We leverage an enormous Mandarin Chinese language data set to train a monolingual model for emoji prediction, extract its trained embedding layer, and adapt it to support the English language. We apply the adapted word embeddings to new cross-lingual English models in the same task as well as a related task, to gauge their transfer potential. Our cross-lingual English models are competitive with monolingual models, achieving 11.8% accuracy at emoji prediction (out of 64 emojis) and 73.2% at binary sentiment classification. Despite the linguistic distance between Chinese and English, our results show strong transfer performance, supporting the assumption that the languages' embedding spaces are similar in topology. We use this assumption to estimate emotional meanings to unique, Weibo-specific emojis without straightforward English translations. Our analyses also reveal that increased diversity in emoji labels in the Chinese emoji prediction pre-training resulted in improved sentiment classification.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Learning cross-lingual word embeddings for sentiment analysis of microblog posts
dc.type	Princeton University Senior Theses
pu.date.classyear	2020
pu.department	Computer Science
pu.pdf.coverpage	SeniorThesisCoverPage
pu.contributor.authorid	961250262
Appears in Collections:	Computer Science, 1987-2023

Files in This Item:

File	Description	Size	Format
ZOU-ANNE-THESIS.pdf		3.21 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse