Skip navigation
Please use this identifier to cite or link to this item:
Title: Optimal Commenting: Predictive Analytics On NYTimes Data
Authors: Sharma, Rohan
Advisors: Lynch, Scott
Department: Computer Science
Class Year: 2014
Abstract: The internet has revolutionized the distribution of news media, and further, allows users to quickly publicize responses to online news media through public commenting. The New York Times online commenting platform also allows registered users to \recommend" comments, thereby crowdsourcing a measure of comment quality. This research attempts to discover relationships between comment recommendation count, comment text and other associated metadata (e.g. newspaper section, time posted) by conducting an in-depth exploration of the New York Times comment dataset (2005 - 2013). In this paper, we review descriptive statistics of the dataset and applicable methods for metadata sourcing, text vectorization and supervised learning. We nd recommendation prediction is best mod- eled in terms of classi cation using the Naive Bayes learning algorithm. We are able to incorporate metadata features using classi er stacking, a form of ensemble learning, to boost performance. We then discuss the results in the broader context of user-generated internet content and crowdsourcing measures of content quality.
Extent: 28 pages
Access Restrictions: Walk-in Access. This thesis can only be viewed on computer terminals at the Mudd Manuscript Library.
Type of Material: Princeton University Senior Theses
Language: en_US
Appears in Collections:Computer Science, 1988-2016

Files in This Item:
File SizeFormat 
sharma_rohan_Thesis.pdf932.62 kBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.